Pairwise Data Clustering by Deterministic Annealing
IEEE Transactions on Pattern Analysis and Machine Intelligence
Performance of Multicore Systems on Parallel Data Clustering with Deterministic Annealing
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Parallel Data Mining on Multicore Clusters
GCC '08 Proceedings of the 2008 Seventh International Conference on Grid and Cooperative Computing
Biomedical Case Studies in Data Intensive Computing
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Design patterns for scientific applications in DryadLINQ CTP
Proceedings of the second international workshop on Data intensive computing in the clouds
Hi-index | 0.00 |
We present performance results on a Windows cluster with up to 768 cores using MPI and two variants of threading – CCR and TPL. CCR (Concurrency and Coordination Runtime) presents a message based interface while TPL (Task Parallel Library) allows for loops to be automatically parallelized. MPI is used between the cluster nodes (up to 32) and either threading or MPI for parallelism on the 24 cores of each node. We use a simple matrix multiplication kernel as well as a significant bioinformatics gene clustering application. We find that the two threading models offer similar performance with MPI outperforming both at low levels of parallelism but threading much better when the grain size (problem size per process) is small. We find better performance on Intel compared to AMD on comparable 24 core systems. We develop simple models for the performance of the clustering code.