Subspace clustering of high-dimensional data: an evolutionary approach

Authors:
Singh Vijendra;Sahoo Laxman
Affiliations:
Department of Computer Science and Engineering, Mody Institute of Technology and Science, Rajasthan, India;School of Computer Engineering, KIIT University, Bhubaneswar, India
Venue:
Applied Computational Intelligence and Soft Computing
Year:
2013

Citing 36
Cited 0

Implementing agglomerative hierarchic clustering algorithms for use in document retrieval

Information Processing and Management: an International Journal
An analysis of the effects of selection in genetic algorithms

An analysis of the effects of selection in genetic algorithms
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Self-Organizing Maps

Self-Organizing Maps
Data Mining: Introductory and Advanced Topics

Data Mining: Introductory and Advanced Topics
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications

Data Mining and Knowledge Discovery
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'

IEEE Transactions on Knowledge and Data Engineering
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
WaveCluster: a wavelet-based clustering approach for spatial data in very large databases

The VLDB Journal — The International Journal on Very Large Data Bases
O-Cluster: Scalable Clustering of Large High Dimensional Data Sets

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Projective Clustering by Histograms

IEEE Transactions on Knowledge and Data Engineering
GCHL: A grid-clustering algorithm for high-dimensional very large spatial data bases

Pattern Recognition Letters
A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Comparing Subspace Clusterings

IEEE Transactions on Knowledge and Data Engineering
Evaluation and comparison of gene clustering methods in microarray analysis

Bioinformatics
Clicks: An effective algorithm for mining subspace clusters in categorical datasets

Data & Knowledge Engineering
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

IEEE Transactions on Knowledge and Data Engineering
A hierarchical genetic algorithm for segmentation of multi-spectral human-brain MRI

Expert Systems with Applications: An International Journal
Efficient algorithms for data mining with federated databases

Efficient algorithms for data mining with federated databases
Semisupervised Clustering with Metric Learning using Relative Comparisons

IEEE Transactions on Knowledge and Data Engineering
A survey of evolutionary algorithms for clustering

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Density Conscious Subspace Clustering for High-Dimensional Data

IEEE Transactions on Knowledge and Data Engineering
DENCLUE 2.0: fast clustering based on kernel density estimation

IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Detection and visualization of subspace cluster hierarchies

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
An Evolutionary Approach to Multiobjective Clustering

IEEE Transactions on Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering high-dimensional data has been a major challenge due to the inherent sparsity of the points. Most existing clustering algorithms become substantially inefficient if the required similarity measure is computed between data points in the fulldimensional space. In this paper, we have presented a robust multi objective subspace clustering (MOSCL) algorithm for the challenging problem of high-dimensional clustering. The first phase of MOSCL performs subspace relevance analysis by detecting dense and sparse regions with their locations in data set. After detection of dense regions it eliminates outliers. MOSCL discovers subspaces in dense regions of data set and produces subspace clusters. In thorough experiments on synthetic and real-world data sets, we demonstrate that MOSCL for subspace clustering is superior to PROCLUS clustering algorithm. Additionally we investigate the effects of first phase for detecting dense regions on the results of subspace clustering. Our results indicate that removing outliers improves the accuracy of subspace clustering. The clustering results are validated by clustering error (CE) distance on various data sets. MOSCL can discover the clusters in all subspaces with high quality, and the efficiency of MOSCL outperforms PROCLUS.