Efficiency of hierarchic agglomerative clustering using the ICL distributed array processor
Journal of Documentation
Parallel algorithms for hierarchical clustering
Parallel Computing
A Fast Parallel Clustering Algorithm for Large Spatial Databases
Data Mining and Knowledge Discovery
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Google's MapReduce programming model – Revisited
Science of Computer Programming
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Parallelization of K-means clustering on multi-core processors
ACS'10 Proceedings of the 10th WSEAS international conference on Applied computer science
Parallel K-means clustering of remote sensing images based on mapreduce
WISM'10 Proceedings of the 2010 international conference on Web information systems and mining
Cloud-based malware detection for evolving data streams
ACM Transactions on Management Information Systems (TMIS)
DVM: towards a datacenter-scale virtual machine
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Proceedings of the VLDB Endowment
A parallel method for computing rough set approximations
Information Sciences: an International Journal
A multi-agent data mining system for cartel detection in Brazilian government procurement
Expert Systems with Applications: An International Journal
Early accurate results for advanced analytics on MapReduce
Proceedings of the VLDB Endowment
Compression-aware I/O performance analysis for big data clustering
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Parallel decision tree with application to water quality data analysis
ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part II
Evaluating the use of clustering for automatically organising digital library collections
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
p-PIC: Parallel power iteration clustering for big data
Journal of Parallel and Distributed Computing
MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data
Frontiers of Computer Science: Selected Publications from Chinese Universities
Hi-index | 0.00 |
Data clustering has been received considerable attention in many applications, such as data mining, document retrieval, image segmentation and pattern classification. The enlarging volumes of information emerging by the progress of technology, makes clustering of very large scale of data a challenging task. In order to deal with the problem, many researchers try to design efficient parallel clustering algorithms. In this paper, we propose a parallel k -means clustering algorithm based on MapReduce, which is a simple yet powerful parallel programming technique. The experimental results demonstrate that the proposed algorithm can scale well and efficiently process large datasets on commodity hardware.