Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
Vector quantization and signal compression
Vector quantization and signal compression
Information retrieval
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Using cluster analysis to classify time series
Conference proceedings on Interpretation of time series from nonlinear mechanical systems
Using MPI: portable parallel programming with the message-passing interface
Using MPI: portable parallel programming with the message-passing interface
The communication software and parallel environment of the IBM SP2
IBM Systems Journal
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Programming with UNIX threads
LogP: a practical model of parallel computation
Communications of the ACM
Scalable parallel data mining for association rules
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
SONIA: a service for organizing networked information autonomously
Proceedings of the third ACM conference on Digital libraries
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Document Categorization and Query Generation on the World Wide WebUsing WebACE
Artificial Intelligence Review - Special issue on data mining on the Internet
Clustering Algorithms
Mining Very Large Databases with Parallel Processing
Mining Very Large Databases with Parallel Processing
Scalable High Performance Computing for Knowledge Discovery and Data Mining
Scalable High Performance Computing for Knowledge Discovery and Data Mining
MPI-The Complete Reference, Volume 1: The MPI Core
MPI-The Complete Reference, Volume 1: The MPI Core
Parallel Algorithms for Discovery of Association Rules
Data Mining and Knowledge Discovery
Effect of Data Distribution in Parallel Mining of Associations
Data Mining and Knowledge Discovery
Parallel Mining of Association Rules
IEEE Transactions on Knowledge and Data Engineering
Parallel Formulations of Decision-Tree Classification Algorithms
ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Parallel Classification for Data Mining on Shared-Memory Multiprocessors
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
A Branch and Bound Algorithm for Computing k-Nearest Neighbors
IEEE Transactions on Computers
Class visualization of high-dimensional data with applications
Computational Statistics & Data Analysis
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A Requirements Analysis for Parallel KDD Systems
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Parallel Fuzzy c-Means Clustering for Large Data Sets
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Collective, Hierarchical Clustering from Distributed, Heterogeneous Data
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
A Scalable Parallel Subspace Clustering Algorithm for Massive Data Sets
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Privacy-preserving Distributed Clustering using Generative Models
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A privacy-sensitive approach to distributed clustering
Pattern Recognition Letters - Special issue: Advances in pattern recognition
Journal of Computational Physics
Effective and Efficient Distributed Model-Based Clustering
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Clustered Nyström method for large scale manifold learning and dimension reduction
IEEE Transactions on Neural Networks
Parallelization of K-means clustering on multi-core processors
ACS'10 Proceedings of the 10th WSEAS international conference on Applied computer science
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
DisClus: a distributed clustering technique over high resolution satellite data
ICDCN'10 Proceedings of the 11th international conference on Distributed computing and networking
Collaborative clustering of XML documents
Journal of Computer and System Sciences
An SMP soft classification algorithm for remote sensing
Proceedings of the 19th High Performance Computing Symposia
A local facility location algorithm for sensor networks
DCOSS'05 Proceedings of the First IEEE international conference on Distributed Computing in Sensor Systems
A study of optimal system for multiple-constraint multiple-container packing problems
IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems
Text clustering for peer-to-peer networks with probabilistic guarantees
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Parallel implementation of information retrieval clustering models
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Scalable co-clustering algorithms
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Improved response modeling based on clustering, under-sampling, and ensemble
Expert Systems with Applications: An International Journal
ESORICS'05 Proceedings of the 10th European conference on Research in Computer Security
Data mining with parallel support vector machines for classification
ADVIS'06 Proceedings of the 4th international conference on Advances in Information Systems
Clustering distributed data streams in peer-to-peer environments
Information Sciences: an International Journal
Proceedings of the VLDB Endowment
Automatic document organization in a p2p environment
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
ACM Transactions on Knowledge Discovery from Data (TKDD)
A framework for Multi-Agent Based Clustering
Autonomous Agents and Multi-Agent Systems
Compression-aware I/O performance analysis for big data clustering
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
International Journal of Metadata, Semantics and Ontologies
Speeding up k-Means algorithm by GPUs
Journal of Computer and System Sciences
Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks
Journal of Parallel and Distributed Computing
On the utility of abstraction in labeling actors in social networks
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Data weighing mechanisms for clustering ensembles
Computers and Electrical Engineering
International Journal of Metadata, Semantics and Ontologies
Evolutionary k-means for distributed data sets
Neurocomputing
Effects of resampling method and adaptation on clustering ensemble efficacy
Artificial Intelligence Review
Hi-index | 0.01 |
To cluster increasingly massive data sets that are common today in data and text mining, we propose a parallel implementation of the k-means clustering algorithm based on the message passing model. The proposed algorithm exploits the inherent data-parallelism in the kmeans algorithm. We analytically show that the speedup and the scaleup of our algorithm approach the optimal as the number of data points increases. We implemented our algorithm on an IBM POWERparallel SP2 with a maximum of 16 nodes. On typical test data sets, we observe nearly linear relative speedups, for example, 15.62 on 16 nodes, and essentially linear scaleup in the size of the data set and in the number of clusters desired. For a 2 gigabyte test data set, our implementation drives the 16 node SP2 at more than 1.8 gigaflops.