Locality of Reference in LU Decomposition with Partial Pivoting
SIAM Journal on Matrix Analysis and Applications
Backtracking-based load balancing
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A MapReduce-based distributed SVM algorithm for automatic image annotation
Computers & Mathematics with Applications
Hi-index | 0.00 |
My entry won award for "Best Overall Productivity" in the 2006 HPC Challenge Class 2 (productivity) competition. I used the Cilk multithreaded programming language [1] to implement all six of the benchmarks, including LU decomposition with partial pivoting, matrix multiplication, vector add, matrix transpose, updates of random locations in a large table, and a huge 1-dimensional FFT. I measured the performance on the NASA's "Columbia" SGI Altix system. The programs achieved good performance (e.g., up to 943Flops on 256 processors for matrix multiplication). I added a total of only 137 keywords to transform the six C programs into Cilk programs.