A Semi-supervised Clustering Algorithm Based on Must-Link Set

Authors:
Haichao Huang;Yong Cheng;Ruilian Zhao
Affiliations:
Computer Department, College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China 100029;Computer Department, College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China 100029;Computer Department, College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China 100029
Venue:
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Year:
2008

Citing 6
Cited 1

Principles of data mining

Principles of data mining
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Semi-supervised clustering: probabilistic models, algorithms and experiments

Semi-supervised clustering: probabilistic models, algorithms and experiments
Two phase semi-supervised clustering using background knowledge

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning

A modified Cop-Kmeans algorithm based on sequenced cannot-link set

RSKT'11 Proceedings of the 6th international conference on Rough sets and knowledge technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering analysis is traditionally considered as an unsupervised learning process. In most cases, people usually have some prior or background knowledge before they perform the clustering. How to use the prior or background knowledge to imporve the cluster quality and promote the efficiency of clustering data has become a hot research topic in recent years. The Must-Link and Cannot-Link constraints between instances are common prior knowledge in many real applications. This paper presents the concept of Must-Link Set and designs a new semi-supervised clustering algorithm MLC-KMeans using Musk-Link Set as assistant centroid. The preliminary experiment on several UCI datasets confirms the effectiveness and efficiency of the algorithm.