Evolutionary k-means for distributed data sets

Authors:
M. C. Naldi;R. J. G. B. Campello
Affiliations:
-;-
Venue:
Neurocomputing
Year:
2014

Citing 41
Cited 0

Multiple comparison procedures

Multiple comparison procedures
Algorithms for clustering data

Algorithms for clustering data
Evolutionary computation: toward a new philosophy of machine intelligence

Evolutionary computation: toward a new philosophy of machine intelligence
Parallel algorithms for hierarchical clustering

Parallel Computing
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Distributed data clustering can be efficient and exact

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Genetic Algorithms and Grouping Problems

Genetic Algorithms and Grouping Problems
An Introduction to Genetic Algorithms

An Introduction to Genetic Algorithms
Machine Learning

Machine Learning
Mining Very Large Databases with Parallel Processing

Mining Very Large Databases with Parallel Processing
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
An evolutionary technique based on K-means algorithm for optimal clustering in RN

Information Sciences—Applications: An International Journal
Self-Adaptive Genetic Algorithm for Clustering

Journal of Heuristics
A Data-Clustering Algorithm on Distributed Memory Multiprocessors

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
FGKA: a Fast Genetic K-means Clustering Algorithm

Proceedings of the 2004 ACM symposium on Applied computing
Evolutionary Algorithms for Clustering Gene-Expression Data

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Privacy-Preserving Data Mining: Why, How, and When

IEEE Security and Privacy
Short communication: A novel parallelization approach for hierarchical clustering

Parallel Computing
PBIRCH: A Scalable Parallel Clustering algorithm for Incremental Data

IDEAS '06 Proceedings of the 10th International Database Engineering and Applications Symposium
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
On the efficiency of evolutionary fuzzy clustering

Journal of Heuristics
Hierarchically Distributed Peer-to-Peer Document Clustering and Cluster Summarization

IEEE Transactions on Knowledge and Data Engineering
The Top Ten Algorithms in Data Mining

The Top Ten Algorithms in Data Mining
Approximate Distributed K-Means Clustering over a Peer-to-Peer Network

IEEE Transactions on Knowledge and Data Engineering
A survey of evolutionary algorithms for clustering

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Comparison Among Methods for k Estimation in k-means

ISDA '09 Proceedings of the 2009 Ninth International Conference on Intelligent Systems Design and Applications
Fast Evolutionary Algorithms for Relational Clustering

ISDA '09 Proceedings of the 2009 Ninth International Conference on Intelligent Systems Design and Applications
A study of some fuzzy cluster validity indices, genetic clustering and application to pixel classification

Fuzzy Sets and Systems
Relative clustering validity criteria: A comparative overview

Statistical Analysis and Data Mining
Evolutionary clustering of relational data

International Journal of Hybrid Intelligent Systems - Advances in Intelligent Agent Systems
Efficiency issues of evolutionary k-means

Applied Soft Computing
Parallel and Distributed Computational Intelligence

Parallel and Distributed Computational Intelligence
Clustering distributed data streams in peer-to-peer environments

Information Sciences: an International Journal
Evolving clusters in gene-expression data

Information Sciences: an International Journal
An Evolutionary Approach to Multiobjective Clustering

IEEE Transactions on Evolutionary Computation
Genetic K-means algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
On cluster validity for the fuzzy c-means model

IEEE Transactions on Fuzzy Systems
Survey of clustering algorithms

IEEE Transactions on Neural Networks
Combining Information from Distributed Evolutionary k-Means

SBRN '12 Proceedings of the 2012 Brazilian Symposium on Neural Networks

Quantified Score

Hi-index	0.01

Visualization

Abstract

One of the challenges for clustering resides in dealing with data distributed in separated repositories, because most clustering techniques require the data to be centralized. One of them, k-means, has been elected as one of the most influential data mining algorithms for being simple, scalable and easily modifiable to a variety of contexts and application domains. Although distributed versions of k-means have been proposed, the algorithm is still sensitive to the selection of the initial cluster prototypes and requires the number of clusters to be specified in advance. In this paper, we propose the use of evolutionary algorithms to overcome the k-means limitations and, at the same time, to deal with distributed data. Two different distribution approaches are adopted: the first obtains a final model identical to the centralized version of the clustering algorithm; the second generates and selects clusters for each distributed data subset and combines them afterwards. The algorithms are compared experimentally from two perspectives: the theoretical one, through asymptotic complexity analyses; and the experimental one, through a comparative evaluation of results obtained from a collection of experiments and statistical tests. The obtained results indicate which variant is more adequate for each application scenario.