Hierarchical density-based clustering of categorical data and a simplification

Authors:
Bill Andreopoulos;Aijun An;Xiaogang Wang
Affiliations:
York University, Dept. of Computer Science and Engineering, Toronto, Ontario, Canada;York University, Dept. of Computer Science and Engineering, Toronto, Ontario, Canada;York University, Dept. of Computer Science and Engineering, Toronto, Ontario, Canada
Venue:
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2007

Citing 18
Cited 5

Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Segmentation problems

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Approximate Graph Partitioning Algorithms

SIAM Journal on Computing
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Algorithms, games, and the internet

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Techniques of Cluster Algorithms in Data Mining

Data Mining and Knowledge Discovery
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Clustering Categorical Data: An Approach Based on Dynamical Systems

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
CLOPE: a fast and effective clustering algorithm for transactional data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
DHC: A Density-Based Hierarchical Clustering Method for Time Series Gene Expression Data

BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
Entropy-based criterion in categorical clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Dimension induced clustering

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Clustering algorithms for categorical data

Clustering algorithms for categorical data

Word Sense Disambiguation in biomedical ontologies with term co-occurrence analysis and document clustering

International Journal of Data Mining and Bioinformatics
Word Sense Disambiguation in biomedical ontologies with term co-occurrence analysis and document clustering

International Journal of Data Mining and Bioinformatics
Efficient layered density-based clustering of categorical data

Journal of Biomedical Informatics
Interpretable clustering using unsupervised binary trees

Advances in Data Analysis and Classification
CRUDAW: a novel fuzzy technique for clustering records following user defined attribute weights

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134

Quantified Score

Hi-index	0.00

Visualization

Abstract

A challenge involved in applying density-based clustering to categorical datasets is that the 'cube' of attribute values has no ordering defined. We propose the HIERDENC algorithm for hierarchical density-based clustering of categorical data. HIERDENC offers a basis for designing simpler clustering algorithms that balance the tradeoff of accuracy and speed. The characteristics of HIERDENC include: (i) it builds a hierarchy representing the underlying cluster structure of the categorical dataset, (ii) it minimizes the user-specified input parameters, (iii) it is insensitive to the order of object input, (iv) it can handle outliers. We evaluate HIERDENC on small-dimensional standard categorical datasets, on which it produces more accurate results than other algorithms. We present a faster simplification of HIERDENC called the MULIC algorithm. MULIC performs better than subspace clustering algorithms in terms of finding the multi-layered structure of special datasets.