Cilk provides the "best overall productivity" for high performance computing: (and won the HPC challenge award to prove it)

Authors:
Bradley C. Kuszmaul
Affiliations:
MIT CSAIL
Venue:
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Year:
2007

Citing 1
Cited 3

Locality of Reference in LU Decomposition with Partial Pivoting

SIAM Journal on Matrix Analysis and Applications

Backtracking-based load balancing

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A MapReduce-based distributed SVM algorithm for automatic image annotation

Computers & Mathematics with Applications
An ontology enhanced parallel SVM for scalable spam filter training

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

My entry won award for "Best Overall Productivity" in the 2006 HPC Challenge Class 2 (productivity) competition. I used the Cilk multithreaded programming language [1] to implement all six of the benchmarks, including LU decomposition with partial pivoting, matrix multiplication, vector add, matrix transpose, updates of random locations in a large table, and a huge 1-dimensional FFT. I measured the performance on the NASA's "Columbia" SGI Altix system. The programs achieved good performance (e.g., up to 943Flops on 256 processors for matrix multiplication). I added a total of only 137 keywords to transform the six C programs into Cilk programs.