Subspace clustering for high dimensional categorical data

Authors:
Guojun Gan;Jianhong Wu
Affiliations:
York University, Toronto, Canada;York University, Toronto, Canada
Venue:
ACM SIGKDD Explorations Newsletter
Year:
2004

Citing 18
Cited 10

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
Algorithms for clustering data

Algorithms for clustering data
Reservoir-sampling algorithms of time complexity O(n(1 + log(N/n)))

ACM Transactions on Mathematical Software (TOMS)
Randomized algorithms

Randomized algorithms
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Accelerating exact k-means algorithms with geometric reasoning

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Density biased sampling: an improved method for data mining and clustering

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Clustering through decision tree construction

Proceedings of the ninth international conference on Information and knowledge management
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Projective ART for clustering data sets in high dimensional spaces

Neural Networks
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A Scalable Parallel Subspace Clustering Algorithm for Massive Data Sets

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
d-Clusters: Capturing Subspace Correlation in a Large Data Set

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data

IEEE Transactions on Knowledge and Data Engineering
High-Dimensional Clustering Method for High Performance Data Mining

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
A novel attribute weighting algorithm for clustering high-dimensional categorical data

Pattern Recognition
Semi-supervised parameter-free divisive hierarchical clustering of categorical data

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
A practical approach for clustering transaction data

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
A fuzzy subspace algorithm for clustering high dimensional data

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
DHCC: Divisive hierarchical clustering of categorical data

Data Mining and Knowledge Discovery
A new approach for cluster detection for large datasets with high dimensionality

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
A new cell-based clustering method for high-dimensional data mining applications

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
A weighting k-modes algorithm for subspace clustering of categorical data

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data clustering has been discussed extensively, but almost all known conventional clustering algorithms tend to break down in high dimensional spaces because of the inherent sparsity of the data points. Existing subspace clustering algorithms for handling high-dimensional data focus on numerical dimensions. In this paper, we designed an iterative algorithm called SUBCAD for clustering high dimensional categorical data sets, based on the minimization of an objective function for clustering. We deduced some cluster memberships changing rules using the objective function. We also designed an objective function to determine the subspace associated with each cluster. We proved various properties of this objective function that are essential for us to design a fast algorithm to find the subspace associated with each cluster. Finally, we carried out some experiments to show the effectiveness of the proposed method and the algorithm.