Amortized efficiency of list update and paging rules
Communications of the ACM
A model for hierarchical memory
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
The input/output complexity of sorting and related problems
Communications of the ACM
Fast fourier transforms: a tutorial review and a state of the art
Signal Processing
Introduction to algorithms
FFTs in external or hierarchical memory
The Journal of Supercomputing
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Large-scale sorting in uniform memory hierarchies
Journal of Parallel and Distributed Computing - Special issue on parallel I/O systems
Deterministic distribution sort in shared and distributed memory multiprocessors
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Randomized algorithms
An analysis of dag-consistent distributed shared-memory algorithms
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Locality of Reference in LU Decomposition with Partial Pivoting
SIAM Journal on Matrix Analysis and Applications
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Recursive array layouts and fast parallel matrix multiplication
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
The influence of caches on the performance of sorting
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
External memory algorithms and data structures
External memory algorithms
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
Extending the Hong-Kung Model to Memory Hierarchies
COCOON '95 Proceedings of the First Annual International Conference on Computing and Combinatorics
Towards an Optimal Bit-Reversal Permutation Program
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Hierarchical memory with block transfer
SFCS '87 Proceedings of the 28th Annual Symposium on Foundations of Computer Science
SFCS '90 Proceedings of the 31st Annual Symposium on Foundations of Computer Science
A study of replacement algorithms for a virtual-storage computer
IBM Systems Journal
Towards a theory of cache-efficient algorithms
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Computational power of pipelined memory hierarchies
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Language support for Morton-order matrices
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
External memory algorithms and data structures: dealing with massive data
ACM Computing Surveys (CSUR)
Cache-oblivious priority queue and graph algorithm applications
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Computation regrouping: restructuring programs for temporal data cache locality
ICS '02 Proceedings of the 16th international conference on Supercomputing
A locality-preserving cache-oblivious dynamic dictionary
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Cache oblivious search trees via binary trees of small height
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Optimal organizations for pipelined hierarchical memories
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
High-Performance Algorithm Engineering for Computational Phylogenetics
The Journal of Supercomputing - Special issue on computational issues in fluid dynamics optimization and simulation
Towards a theory of cache-efficient algorithms
Journal of the ACM (JACM)
The set-associative cache performance of search trees
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
High-Performance Algorithm Engineering for Computational Phylogenetics
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Optimizing Graph Algorithms for Improved Cache Performance
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Characterization of Temporal Locality and Its Portability across Memory Hierarchies
ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Exponential Structures for Efficient Cache-Oblivious Algorithms
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Cache Oblivious Distribution Sweeping
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Funnel Heap - A Cache Oblivious Priority Queue
ISAAC '02 Proceedings of the 13th International Symposium on Algorithms and Computation
Automatic Generation of Block-Recursive Codes
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Fractal Matrix Multiplication: A Case Study on Portability of Cache Performance
WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
Optimised Predecessor Data Structures for Internal Memory
WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
From Algorithm to Program to Software Library
Informatics - 10 Years Back. 10 Years Ahead.
Analysing the Cache Behaviour of Non-uniform Distribution Sorting Algorithms
ESA '00 Proceedings of the 8th Annual European Symposium on Algorithms
Efficient Tree Layout in a Multilevel Memory Hierarchy
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Scanning and Traversing: Maintaining Data for Traversals in a Memory Hierarchy
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Cache-oblivious data structures for orthogonal range searching
Proceedings of the nineteenth annual symposium on Computational geometry
External memory data structures
Handbook of massive data sets
Handbook of massive data sets
On the limits of cache-obliviousness
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Algorithm engineering for parallel computation
Experimental algorithmics
A comparison of cache aware and cache oblivious static search trees using program instrumentation
Experimental algorithmics
Efficient sorting using registers and caches
Journal of Experimental Algorithmics (JEA)
Adapting Radix Sort to the Memory Hierarchy
Journal of Experimental Algorithmics (JEA)
Proximity Mergesort: optimal in-place sorting in the cache-oblivious model
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Approximate minimum enclosing balls in high dimensions using core-sets
Journal of Experimental Algorithmics (JEA)
Effectively sharing a cache among threads
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Cache-oblivious shortest paths in graphs using buffer heap
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Restructuring computations for temporal data cache locality
International Journal of Parallel Programming
Implicit B-trees: a new data structure for the dictionary problem
Journal of Computer and System Sciences - Special issue on FOCS 2002
Optimizing Graph Algorithms for Improved Cache Performance
IEEE Transactions on Parallel and Distributed Systems
Interactive Exploration of Large Remote Micro-CT Scans
VIS '04 Proceedings of the conference on Visualization '04
Journal of Experimental Algorithmics (JEA)
A locality-preserving cache-oblivious dynamic dictionary
Journal of Algorithms
Cache-oblivious planar orthogonal range searching and counting
SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
External-memory exact and approximate all-pairs shortest-paths in undirected graphs
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
ACM SIGGRAPH 2005 Papers
Concurrent cache-oblivious b-trees
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Statistical Models for Empirical Search-Based Performance Tuning
International Journal of High Performance Computing Applications
Think globally, search locally
Proceedings of the 19th annual international conference on Supercomputing
Cache oblivious stencil computations
Proceedings of the 19th annual international conference on Supercomputing
Implicit dictionaries with O(1) modifications per update and fast search
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Cache-oblivious string dictionaries
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Cache-oblivious dynamic programming
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
A computational study of external-memory BFS algorithms
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
A hierarchical model of data locality
Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Improving locality with parallel hierarchical copying GC
Proceedings of the 5th international symposium on Memory management
Simple and semi-dynamic structures for cache-oblivious planar orthogonal range searching
Proceedings of the twenty-second annual symposium on Computational geometry
Architecture-conscious hashing
DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
Cache-oblivious string B-trees
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The cache-oblivious gaussian elimination paradigm: theoretical framework and experimental evaluation
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
The cache complexity of multithreaded cache oblivious algorithms
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Fast additions on masked integers
ACM SIGPLAN Notices
Analyzing block locality in Morton-order and Morton-hybrid matrices
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Cache-conscious frequent pattern mining on modern and emerging processors
The VLDB Journal — The International Journal on Very Large Data Bases
Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms
Proceedings of the 2006 workshop on Memory system performance and correctness
Cache-oblivious nested-loop joins
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A data structure for a sequence of string accesses in external memory
ACM Transactions on Algorithms (TALG)
Locality and parallelism optimization for dynamic programming algorithm in bioinformatics
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A memory model for scientific algorithms on graphics processors
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Cache-Friendly implementations of transitive closure
Journal of Experimental Algorithmics (JEA)
Linear work suffix array construction
Journal of the ACM (JACM)
Engineering a cache-oblivious sorting algorithm
Journal of Experimental Algorithmics (JEA)
Compilation for explicitly managed memory hierarchies
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
The memory behavior of cache oblivious stencil computations
The Journal of Supercomputing
Cache oblivious algorithms for nonserial polyadic programming
The Journal of Supercomputing
Models for parallel and hierarchical computation
Proceedings of the 4th international conference on Computing frontiers
On the importance of cache tuning in a cache-aware algorithm: A case study
Computers & Mathematics with Applications
Balanced allocation and dictionaries with tightly packed constant size bins
Theoretical Computer Science
Optimal sparse matrix dense vector multiplication in the I/O-model
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Cache-oblivious streaming B-trees
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
An experimental comparison of cache-oblivious and cache-conscious programs
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
HAT-trie: a cache-conscious trie-based data structure for strings
ACSC '07 Proceedings of the thirtieth Australasian conference on Computer science - Volume 62
Representation-transparent matrix algorithms with scalable performance
Proceedings of the 21st annual international conference on Supercomputing
Adaptive Strassen's matrix multiplication
Proceedings of the 21st annual international conference on Supercomputing
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Multithreaded programming in Cilk
Proceedings of the 2007 international workshop on Parallel symbolic computation
Making deterministic signatures quickly
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
A faster cache-oblivious shortest-path algorithm for undirected graphs with bounded edge lengths
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Analyzing block locality in Morton-order and Morton-hybrid matrices
ACM SIGARCH Computer Architecture News
Alternation and redundancy analysis of the intersection problem
ACM Transactions on Algorithms (TALG)
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Provably good multicore cache performance for divide-and-conquer algorithms
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient gather and scatter operations on graphics processors
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A general framework for improving query processing performance on multi-level memory hierarchies
DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Cache-oblivious databases: Limitations and opportunities
ACM Transactions on Database Systems (TODS)
An experimental study of sorting and branch prediction
Journal of Experimental Algorithmics (JEA)
Terracost: Computing least-cost-path surfaces for massive grid terrains
Journal of Experimental Algorithmics (JEA)
On searching compressed string collections cache-obliviously
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Cache-efficient dynamic programming algorithms for multicores
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Combating I-O bottleneck using prefetching: model, algorithms, and ramifications
The Journal of Supercomputing
On the limits of cache-oblivious rational permutations
Theoretical Computer Science
Algorithms and data structures for external memory
Foundations and Trends® in Theoretical Computer Science
Massive model visualization techniques: course notes
ACM SIGGRAPH 2008 classes
Computing visibility on terrains in external memory
Journal of Experimental Algorithmics (JEA)
I/O Efficient Dynamic Data Structures for Longest Prefix Queries
SWAT '08 Proceedings of the 11th Scandinavian workshop on Algorithm Theory
A Bridging Model for Multi-core Computing
ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Cache-Oblivious Red-Blue Line Segment Intersection
ESA '08 Proceedings of the 16th annual European symposium on Algorithms
International Journal of Computational Science and Engineering
Cache-oblivious selection in sorted X+Y matrices
Information Processing Letters
B-tries for disk-based string management
The VLDB Journal — The International Journal on Very Large Data Bases
Making deterministic signatures quickly
ACM Transactions on Algorithms (TALG)
Optimal in-place algorithms for 3-D convex hulls and 2-D segment intersection
Proceedings of the twenty-fifth annual symposium on Computational geometry
Cache-oblivious range reporting with optimal queries requires superlinear space
Proceedings of the twenty-fifth annual symposium on Computational geometry
A general approach for cache-oblivious range reporting and approximate range counting
Proceedings of the twenty-fifth annual symposium on Computational geometry
Harnessing parallelism in multicore clusters with the all-pairs and wavefront abstractions
Proceedings of the 18th ACM international symposium on High performance distributed computing
On approximating the ideal random access machine by physical machines
Journal of the ACM (JACM)
Assembling approximately optimal binary search trees efficiently using arithmetics
Information Processing Letters
Lists revisited: Cache-conscious STL lists
Journal of Experimental Algorithmics (JEA)
Design and Engineering of External Memory Traversal Algorithms for General Graphs
Algorithmics of Large and Complex Networks
On Cartesian Trees and Range Minimum Queries
ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Brief announcement: low depth cache-oblivious sorting
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Communication-optimal parallel and sequential Cholesky decomposition: extended abstract
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Masking patterns in sequences: A new class of motif discovery with don't cares
Theoretical Computer Science
Relational query coprocessing on graphics processors
ACM Transactions on Database Systems (TODS)
Multiprocessor System-on-Chip designs with active memory processors for higher memory efficiency
Proceedings of the 46th Annual Design Automation Conference
Cache-optimal algorithms for option pricing
ACM Transactions on Mathematical Software (TOMS)
Sorting on architecturally diverse computer systems
Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications
Improved visibility computation on massive grid terrains
Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Evaluating multicore algorithms on the unified memory model
Scientific Programming - Software Development for Multi-core Computing Systems
Engineering burstsort: Toward fast in-place string sorting
Journal of Experimental Algorithmics (JEA)
Randomized multi-pass streaming skyline algorithms
Proceedings of the VLDB Endowment
Computational Geometry: Theory and Applications
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Bureaucratic protocols for secure two-party sorting, selection, and permuting
ASIACCS '10 Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security
Optimal cache-oblivious implicit dictionaries
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Simple linear work suffix array construction
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
The worst page-replacement policy
FUN'07 Proceedings of the 4th international conference on Fun with algorithms
Is cache-oblivious DGEMM viable?
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
On the limits of cache-oblivious matrix transposition
TGC'06 Proceedings of the 2nd international conference on Trustworthy global computing
I/O-efficient map overlay and point location in low-density subdivisions
ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
Engineering burstsort: towards fast in-place string sorting
WEA'08 Proceedings of the 7th international conference on Experimental algorithms
Characterizing the performance of flash memory storage devices and its impact on algorithm design
WEA'08 Proceedings of the 7th international conference on Experimental algorithms
I/O-efficient point location in a set of rectangles
LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
Cache-oblivious ray reordering
ACM Transactions on Graphics (TOG)
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Journal of Discrete Algorithms
Static reuse distances for locality-based optimizations in MATLAB
Proceedings of the 24th ACM International Conference on Supercomputing
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
The Cilkview scalability analyzer
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Low depth cache-oblivious algorithms
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Optimal in-place and cache-oblivious algorithms for 3-d convex hulls and 2-d segment intersection
Computational Geometry: Theory and Applications
A general approach for cache-oblivious range reporting and approximate range counting
Computational Geometry: Theory and Applications
Balanced dense polynomial multiplication on multi-cores
ACM Communications in Computer Algebra
Parallel computation of the minimal elements of a poset
Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Cache-oblivious polygon indecomposability testing
Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Cache friendly sparse matrix-vector multiplication
Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Cache-Oblivious Dynamic Programming for Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Fast field solver for the simulation of large-area OLEDs
Microelectronics Journal
Additive spanners and (α, β)-spanners
ACM Transactions on Algorithms (TALG)
Engineering scalable, cache and space efficient tries for strings
The VLDB Journal — The International Journal on Very Large Data Bases
Cache-oblivious dynamic dictionaries with update/query tradeoffs
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Compression, indexing, and retrieval for massive string data
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
ISB-tree: A new indexing scheme with efficient expected behaviour
Journal of Discrete Algorithms
Resource oblivious sorting on multicores
ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
Hierarchical Diagonal Blocking and Precision Reduction Applied to Combinatorial Multigrid
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Fast prefix search in little space, with applications
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I
A bridging model for multi-core computing
Journal of Computer and System Sciences
Cache-oblivious simulation of parallel programs
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Improving locality of nonserial polyadic dynamic programming
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Algorithm engineering: bridging the gap between algorithm theory and practice
Algorithm engineering: bridging the gap between algorithm theory and practice
Redesigning the string hash table, burst trie, and BST to exploit cache
Journal of Experimental Algorithmics (JEA)
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Cache complexity and multicore implementation for univariate real root isolation
ACM Communications in Computer Algebra
Cache friendly sparse matrix-vector multiplication
ACM Communications in Computer Algebra
Patterns for cache optimizations on multi-processor machines
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Three layer cake for shared-memory programming
Proceedings of the 2010 Workshop on Parallel Programming Patterns
autopin: automated optimization of thread-to-core pinning on multicore systems
Transactions on high-performance embedded architectures and compilers III
Cache-oblivious index for approximate string matching
Theoretical Computer Science
Graph expansion and communication costs of fast matrix multiplication: regular submission
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Scheduling irregular parallel computations on hierarchical caches
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Balance principles for algorithm-architecture co-design
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Stratified B-trees and versioned dictionaries
HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
Proceedings of the 8th ACM International Conference on Computing Frontiers
On the weak prefix-search problem
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
ACM Transactions on Algorithms (TALG)
Two-dimensional cache-oblivious sparse matrix-vector multiplication
Parallel Computing
Communication-optimal Parallel and Sequential Cholesky Decomposition
SIAM Journal on Scientific Computing
Algorithmic ramifications of prefetching in memory hierarchy
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Optimizing matrix multiplication with a classifier learning system
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Lists revisited: cache conscious STL lists
WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
I/O-efficient data structures for colored range and prefix reporting
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
External string sorting: faster and cache-oblivious
STACS'06 Proceedings of the 23rd Annual conference on Theoretical Aspects of Computer Science
A cache oblivious algorithm for matrix multiplication based on peano's space filling curve
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Multi-DaC programming model: a variant of multi-BSP model for divide-and-conquer algorithms
DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
Cache-oblivious planar shortest paths
ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Cache-aware and cache-oblivious adaptive sorting
ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Cache-oblivious comparison-based algorithms on multisets
ESA'05 Proceedings of the 13th annual European conference on Algorithms
Out-of-Core Computations of High-Resolution Level Sets by Means of Code Transformation
Journal of Scientific Computing
FFT-based dense polynomial arithmetic on multi-cores
HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
ISB-tree: a new indexing scheme with efficient expected behaviour
ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Improved space bounds for cache-oblivious range reporting
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
JuliusC: a practical approach for the analysis of divide-and-conquer algorithms
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
A paradigm for parallel matrix algorithms: scalable cholesky
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
A family of high-performance matrix multiplication algorithms
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
A cache-aware algorithm for PDEs on hierarchical data structures
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
TL-DAE: thread-level decoupled access/execution for OpenMP on the cyclops-64 many-core processor
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Fast and cache-oblivious dynamic programming with local dependencies
LATA'12 Proceedings of the 6th international conference on Language and Automata Theory and Applications
Revisiting the cache miss analysis of multithreaded algorithms
LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics
On the communication complexity of 3D FFTs and its implications for Exascale
Proceedings of the 26th ACM international conference on Supercomputing
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Parallel and I/O efficient set covering algorithms
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Sorting on GPUs for large scale datasets: A thorough comparison
Information Processing and Management: an International Journal
CloudIQ: a framework for processing base stations in a data center
Proceedings of the 18th annual international conference on Mobile computing and networking
In-place heap construction with optimized comparisons, moves, and cache misses
MFCS'12 Proceedings of the 37th international conference on Mathematical Foundations of Computer Science
Optimizing matrix transposes using a POWER7 cache model and explicit prefetching
ACM SIGMETRICS Performance Evaluation Review
How to achieve scalable fork/join on many-core architectures?
Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity
Cache-efficient parallel isosurface extraction for shared cache multicores
EG PGV'10 Proceedings of the 10th Eurographics conference on Parallel Graphics and Visualization
Parallel construction of k-nearest neighbor graphs for point clouds
SPBG'08 Proceedings of the Fifth Eurographics / IEEE VGTC conference on Point-Based Graphics
Aspen: a domain specific language for performance modeling
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Communication-avoiding parallel strassen: implementation and performance
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Cache-oblivious index for approximate string matching
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Graph expansion and communication costs of fast matrix multiplication
Journal of the ACM (JACM)
Avoiding communication through a multilevel LU factorization
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
An energy complexity model for algorithms
Proceedings of the 4th conference on Innovations in Theoretical Computer Science
Cache and I/O efficent functional algorithms
POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Cache-Oblivious dictionaries and multimaps with negligible failure probability
MedAlg'12 Proceedings of the First Mediterranean conference on Design and Analysis of Algorithms
The power 775 architecture at scale
Proceedings of the 27th international ACM conference on International conference on supercomputing
On the weak prefix-search problem
Theoretical Computer Science
Locality-aware task management for unstructured parallelism: a quantitative limit study
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Program-centric cost models for locality
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Parallel triangle counting in massive streaming graphs
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Modeling synthetic aperture radar computation with Aspen
International Journal of High Performance Computing Applications
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
X-Stream: edge-centric graph processing using streaming partitions
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
A generic high-performance method for deinterleaving scientific data
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Efficient parallel and external matching
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Scalability study of molecular dynamics simulation on Godson-T many-core architecture
Journal of Parallel and Distributed Computing
A memory access model for highly-threaded many-core architectures
Future Generation Computer Systems
Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach
The Journal of Supercomputing
Hi-index | 0.01 |
This paper presents asymptotically optimal algorithms for rectangular matrix transpose, FFT, and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve optimality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size Z and cache-line length L where \math the number of cache misses for an \math matrix transpose is \math. The number of cache misses for either an n-point FFT or the sorting of n numbers is \math. We also give an \math-work algorithm to multiply an \math matrix by an \math matrix that incurs \math cache faults.We introduce an `ideal-cache' model to analyze our algorithms. We prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement. We also provide preliminary empirical results on the effectiveness of cache-oblivious algorithms in practice.