MPI: A Message-Passing Interface Standard
MPI: A Message-Passing Interface Standard
Linear correlation discovery in databases: a data mining approach
Data & Knowledge Engineering
Using SCTP to hide latency in MPI programs
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
DPSP: distributed progressive sequential pattern mining on the cloud
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Hi-index | 0.00 |
The computation of covariance and correlation matrices are critical to many data mining applications and processes. Unfortunately the classical covariance and correlation matrices are very sensitive to outliers. Robust methods, such as QC and the Maronna method, have been proposed. However, existing algorithms for QC only give acceptable performance when the dimensionality of the matrix is in the hundreds; and the Maronna method is rarely used in practice because of its high computational cost.In this paper, we develop parallel algorithms for both QC and the Maronna method. We evaluate these parallel algorithms using a real data set of the gene expression of over 6,000 genes, giving rise to a matrix of over 18 million entries. In our experimental evaluation, we explore scalability in dimensionality and in the number of processors. We also compare the parallel behaviours of the two methods. After thorough experimentation, we conclude that for many data mining applications, both QC and Maronna are viable options. Less robust, but faster, QC is the recommended choice for small parallel platforms. On the other hand, the Maronna method is the recommended choice when a high degree of robustness is required, or when the parallel platform features a high number of processors.