Cache-efficient dynamic programming algorithms for multicores

Authors:
Rezaul Alam Chowdhury;Vijaya Ramachandran
Affiliations:
The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA
Venue:
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Year:
2008

Citing 16
Cited 14

A bridging model for parallel computation

Communications of the ACM
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel algorithms for dynamic programming recurrences with more than O(1) dependency

Journal of Parallel and Distributed Computing
Can shared-memory model serve as a bridging model for parallel computation?

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Provably efficient scheduling for languages with fine-grained parallelism

Journal of the ACM (JACM)
Scheduling multithreaded computations by work stealing

Journal of the ACM (JACM)
A linear space algorithm for computing maximal common subsequences

Communications of the ACM
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Introduction to Algorithms

Introduction to Algorithms
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Effectively sharing a cache among threads

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Cache-oblivious dynamic programming

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
The cache complexity of multithreaded cache oblivious algorithms

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
The cache-oblivious gaussian elimination paradigm: theoretical framework, parallelization and experimental evaluation

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
A parallel dynamic programming algorithm on a multi-core architecture

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Provably good multicore cache performance for divide-and-conquer algorithms

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms

A Bridging Model for Multi-core Computing

ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Brief announcement: low depth cache-oblivious sorting

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Overview of Multicore Requirements towards Real-Time Communication

SEUS '09 Proceedings of the 7th IFIP WG 10.2 International Workshop on Software Technologies for Embedded and Ubiquitous Systems
Low depth cache-oblivious algorithms

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Lock-free parallel dynamic programming

Journal of Parallel and Distributed Computing
Cache-Oblivious Dynamic Programming for Bioinformatics

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Geometric algorithms for private-cache chip multiprocessors

ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
A bridging model for multi-core computing

Journal of Computer and System Sciences
Scheduling irregular parallel computations on hierarchical caches

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Paging for multi-core shared caches

Proceedings of the 3rd Innovations in Theoretical Computer Science Conference
Fast and cache-oblivious dynamic programming with local dependencies

LATA'12 Proceedings of the 6th international conference on Language and Automata Theory and Applications
Revisiting the cache miss analysis of multithreaded algorithms

LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics
A parallel buffer tree

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
A memory access model for highly-threaded many-core architectures

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present cache-efficient chip multiprocessor (CMP) algorithms with good speed-up for some widely used dynamic programming algorithms. We consider three types of caching systems for CMPs: D-CMP with a private cache for each core, S-CMP with a single cache shared by all cores, and Multicore, which has private L1 caches and a shared L2 cache. We derive results for three classes of problems: local dependency dynamic programming (LDDP), Gaussian Elimination Paradigm (GEP), and parenthesis problem. For each class of problems, we develop a generic CMP algorithm with an associated tiling sequence. We then tailor this tiling sequence to each caching model and provide a parallel schedule that results in a cache-efficient parallel execution up to the critical path length of the underlying dynamic programming algorithm. We present experimental results on an 8-core Opteron for two sequence alignment problems that are important examples of LDDP. Our experimental results show good speed-ups for simple versions of our algorithms.