HISSCLU: a hierarchical density-based method for semi-supervised clustering

Authors:
Christian Böhm;Claudia Plant
Affiliations:
University of Munich;Technical University of Munich
Venue:
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Year:
2008

Citing 15
Cited 3

Algorithms for clustering data

Algorithms for clustering data
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Round robin classification

The Journal of Machine Learning Research
Evolutionary semi-supervised fuzzy clustering

Pattern Recognition Letters
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating constraints and metric learning in semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Large margin hierarchical classification

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Supervised Clustering " Algorithms and Benefits

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Supervised machine learning techniques for the classification of metabolic disorders in newborns

Bioinformatics
Hierarchical classification: combining Bayes with SVM

ICML '06 Proceedings of the 23rd international conference on Machine learning
Enhancing instance-based classification with local density: a new algorithm for classifying unbalanced biomedical data

Bioinformatics
Automatic extraction of clusters from hierarchical clustering representations

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Ensembles of balanced nested dichotomies for multi-class problems

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Finding Biologically Accurate Clusterings in Hierarchical Tree Decompositions Using the Variation of Information

RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology
SHACUN: semi-supervised hierarchical active clustering based on ranking constraints

ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
A new semi-supervised hierarchical active clustering based on ranking constraints for analysts groupization

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In situations where class labels are known for a part of the objects, a cluster analysis respecting this information, i.e. semi-supervised clustering, can give insight into the class and cluster structure of a data set. Several semi-supervised clustering algorithms such as HMRF-K-Means [4], COP-K-Means [26] and the CCL-algorithm [18] have recently been proposed. Most of them extend well-known clustering methods (K-Means [22], Complete Link [17] by enforcing two types of constraints: must-links between objects of the same class and cannot-links between objects of different classes. In this paper, we propose HISSCLU, a hierarchical, density-based method for semi-supervised clustering. Instead of deriving explicit constraints from the labeled objects, HISSCLU expands the clusters starting at all labeled objects simultaneously. During the expansion, class labels are assigned to the unlabeled objects most consistently with the cluster structure. Using this information the hierarchical cluster structure is determined. The result is visualized in a semi-supervised cluster diagram showing both cluster structure as well as class assignment. Compared to methods based on must-links and cannot-links, our method allows a better preservation of the actual cluster structure, particularly if the data set contains several distinct clusters of the same class (i.e. the intra-class data distribution is multimodal). HISSCLU has a determinate result, is efficient and robust against noise. The performance of our algorithm is shown in an extensive experimental evaluation on synthetic and real-world data sets.