MPI: The Complete Reference
CLARANS: A Method for Clustering Objects for Spatial Data Mining
IEEE Transactions on Knowledge and Data Engineering
Monte Carlo Statistical Methods (Springer Texts in Statistics)
Monte Carlo Statistical Methods (Springer Texts in Statistics)
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Monte Carlo methods for matrix computations on the grid
Future Generation Computer Systems
Future Generation Computer Systems
SciCloud: Scientific Computing on the Cloud
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Twister: a runtime for iterative MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
HaLoop: efficient iterative data processing on large clusters
Proceedings of the VLDB Endowment
Adapting scientific computing problems to clouds using MapReduce
Future Generation Computer Systems
Hi-index | 0.00 |
Cloud computing, with its promise of virtually limitless resources, seems to suit well in solving resource intensive problems from machine learning and data mining domains, by allowing to scale any distributed data mining or machine learning application with little difficulty. However, to be able to run these applications on the cloud infrastructure, the applications must be reduced to frameworks that can successfully exploit the cloud resources, like Hadoop MapReduce. It offers both automatic parallelization and fault tolerance on the cloud commodity hardware. However, it is not trivial to adapt complex algorithms to MapReduce model, as often it is more suited for simple and embarrassingly parallel algorithms. Yet, there are some types of more complex algorithms that are suitable for MapReduce and in this work we look at one such algorithm, Clustering LARge Applications (CLARA), which can be used for clustering extra large number of objects. The paper describes how CLARA is reduced to MapReduce model along with a detailed analysis in the Hadoop MapReduce implementation. The paper also provides a case study where the algorithm is successfully applied in clustering pen-based recognition of handwritten digits data set.