A model for hierarchical memory
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
The input/output complexity of sorting and related problems
Communications of the ACM
Communication complexity of PRAMs
Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
A bridging model for parallel computation
Communications of the ACM
Parallel algorithms for shared-memory machines
Handbook of theoretical computer science (vol. A)
Parallel sorting by regular sampling
Journal of Parallel and Distributed Computing
Deterministic distribution sort in shared and distributed memory multiprocessors
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Direct bulk-synchronous parallel algorithms
Journal of Parallel and Distributed Computing
Towards efficiency and portability: programming with the BSP model
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
LogP: a practical model of parallel computation
Communications of the ACM
Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract)
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The bulk-synchronous parallel random access machine
Theoretical Computer Science - Special issue on parallel computing
BOS is boss: a case for bulk-synchronous object systems
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
BSPlib: The BSP programming library
Parallel Computing
On the Effectiveness of D-BSP as a Bridging Model of Parallel Computation
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Submachine Locality in the Bulk Synchronous Setting (Extended Abstract)
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Extending the Hong-Kung Model to Memory Hierarchies
COCOON '95 Proceedings of the First Annual International Conference on Computing and Combinatorics
The Paderborn University BSP (PUB) library
Parallel Computing
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Parallelism in random access machines
STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
Communication lower bounds for distributed-memory matrix multiplication
Journal of Parallel and Distributed Computing
Heterogeneous Chip Multiprocessors
Computer
Fast synchronization for chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Provably good multicore cache performance for divide-and-conquer algorithms
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
BSGP: bulk-synchronous GPU programming
ACM SIGGRAPH 2008 papers
Fundamental parallel algorithms for private-cache chip multiprocessors
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Cache-efficient dynamic programming algorithms for multicores
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Amdahl's Law in the Multicore Era
Computer
A unified model for multicore architectures
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
What Hill-Marty model learn from and break through Amdahl's law?
Information Processing Letters
Multi-DaC programming model: a variant of multi-BSP model for divide-and-conquer algorithms
DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
SGL: towards a bridging model for heterogeneous hierarchical platforms
International Journal of High Performance Computing and Networking
Palovca: describing and executing graph algorithms in haskell
PADL'12 Proceedings of the 14th international conference on Practical Aspects of Declarative Languages
3D inverted index with cache sharing for web search engines
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Adapting MPI to MapReduce PaaS Clouds: An Experiment in Cross-Paradigm Execution
UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Proceedings of the 16th International ACM Sigsoft symposium on Component-based software engineering
Approximate parallel simulation of web search engines
Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
Efficient Parallel Implementations of Multiple Sequence Alignment using BSP/CGM Model
Proceedings of Programming Models and Applications on Multicores and Manycores
Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach
The Journal of Supercomputing
Hi-index | 0.00 |
Writing software for one parallel system is a feasible though arduous task. Reusing the substantial intellectual effort so expended for programming a second system has proved much more challenging. In sequential computing algorithms textbooks and portable software are resources that enable software systems to be written that are efficiently portable across changing hardware platforms. These resources are currently lacking in the area of multi-core architectures, where a programmer seeking high performance has no comparable opportunity to build on the intellectual efforts of others. In order to address this problem we propose a bridging model aimed at capturing the most basic resource parameters of multi-core architectures. We suggest that the considerable intellectual effort needed for designing efficient algorithms for such architectures may be most fruitfully expended in designing portable algorithms, once and for all, for such a bridging model. Portable algorithms would contain efficient designs for all reasonable combinations of the basic resource parameters and input sizes, and would form the basis for implementation or compilation for particular machines. Our Multi-BSP model is a multi-level model that has explicit parameters for processor numbers, memory/cache sizes, communication costs, and synchronization costs. The lowest level corresponds to shared memory or the PRAM, acknowledging the relevance of that model for whatever limitations on memory and processor numbers it may be efficacious to emulate it. We propose parameter-aware portable algorithms that run efficiently on all relevant architectures with any number of levels and any combination of parameters. For these algorithms we define a parameter-free notion of optimality. We show that for several fundamental problems, including standard matrix multiplication, the Fast Fourier Transform, and comparison sorting, there exist optimal portable algorithms in that sense, for all combinations of machine parameters. Thus some algorithmic generality and elegance can be found in this many parameter setting.