Multiple comparison procedures
Multiple comparison procedures
Algorithms for clustering data
Algorithms for clustering data
Evolutionary computation: toward a new philosophy of machine intelligence
Evolutionary computation: toward a new philosophy of machine intelligence
Parallel algorithms for hierarchical clustering
Parallel Computing
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
Distributed data clustering can be efficient and exact
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Genetic Algorithms and Grouping Problems
Genetic Algorithms and Grouping Problems
An Introduction to Genetic Algorithms
An Introduction to Genetic Algorithms
Machine Learning
Mining Very Large Databases with Parallel Processing
Mining Very Large Databases with Parallel Processing
On Clustering Validation Techniques
Journal of Intelligent Information Systems
Parallel and Distributed Association Mining: A Survey
IEEE Concurrency
An evolutionary technique based on K-means algorithm for optimal clustering in RN
Information Sciences—Applications: An International Journal
Self-Adaptive Genetic Algorithm for Clustering
Journal of Heuristics
A Data-Clustering Algorithm on Distributed Memory Multiprocessors
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
FGKA: a Fast Genetic K-means Clustering Algorithm
Proceedings of the 2004 ACM symposium on Applied computing
Evolutionary Algorithms for Clustering Gene-Expression Data
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Privacy-Preserving Data Mining: Why, How, and When
IEEE Security and Privacy
PBIRCH: A Scalable Parallel Clustering algorithm for Incremental Data
IDEAS '06 Proceedings of the 10th International Database Engineering and Applications Symposium
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
On the efficiency of evolutionary fuzzy clustering
Journal of Heuristics
Hierarchically Distributed Peer-to-Peer Document Clustering and Cluster Summarization
IEEE Transactions on Knowledge and Data Engineering
The Top Ten Algorithms in Data Mining
The Top Ten Algorithms in Data Mining
Approximate Distributed K-Means Clustering over a Peer-to-Peer Network
IEEE Transactions on Knowledge and Data Engineering
A survey of evolutionary algorithms for clustering
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Comparison Among Methods for k Estimation in k-means
ISDA '09 Proceedings of the 2009 Ninth International Conference on Intelligent Systems Design and Applications
Fast Evolutionary Algorithms for Relational Clustering
ISDA '09 Proceedings of the 2009 Ninth International Conference on Intelligent Systems Design and Applications
Relative clustering validity criteria: A comparative overview
Statistical Analysis and Data Mining
Evolutionary clustering of relational data
International Journal of Hybrid Intelligent Systems - Advances in Intelligent Agent Systems
Efficiency issues of evolutionary k-means
Applied Soft Computing
Parallel and Distributed Computational Intelligence
Parallel and Distributed Computational Intelligence
Clustering distributed data streams in peer-to-peer environments
Information Sciences: an International Journal
Evolving clusters in gene-expression data
Information Sciences: an International Journal
An Evolutionary Approach to Multiobjective Clustering
IEEE Transactions on Evolutionary Computation
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
On cluster validity for the fuzzy c-means model
IEEE Transactions on Fuzzy Systems
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Combining Information from Distributed Evolutionary k-Means
SBRN '12 Proceedings of the 2012 Brazilian Symposium on Neural Networks
Hi-index | 0.01 |
One of the challenges for clustering resides in dealing with data distributed in separated repositories, because most clustering techniques require the data to be centralized. One of them, k-means, has been elected as one of the most influential data mining algorithms for being simple, scalable and easily modifiable to a variety of contexts and application domains. Although distributed versions of k-means have been proposed, the algorithm is still sensitive to the selection of the initial cluster prototypes and requires the number of clusters to be specified in advance. In this paper, we propose the use of evolutionary algorithms to overcome the k-means limitations and, at the same time, to deal with distributed data. Two different distribution approaches are adopted: the first obtains a final model identical to the centralized version of the clustering algorithm; the second generates and selects clusters for each distributed data subset and combines them afterwards. The algorithms are compared experimentally from two perspectives: the theoretical one, through asymptotic complexity analyses; and the experimental one, through a comparative evaluation of results obtained from a collection of experiments and statistical tests. The obtained results indicate which variant is more adequate for each application scenario.