Background knowledge integration in clustering using purity indexes

Authors:
Germain Forestier;Cédric Wemmert;Pierre Gançarski
Affiliations:
Image Sciences, Computer Sciences and Remote Sensing Laboratory, University of Strasbourg, France;Image Sciences, Computer Sciences and Remote Sensing Laboratory, University of Strasbourg, France;Image Sciences, Computer Sciences and Remote Sensing Laboratory, University of Strasbourg, France
Venue:
KSEM'10 Proceedings of the 4th international conference on Knowledge science, engineering and management
Year:
2010

Citing 17
Cited 0

Information Retrieval

Information Retrieval
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Fuzzy clustering with a knowledge-based guidance

Pattern Recognition Letters
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating constraints and metric learning in semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Supervised Clustering " Algorithms and Benefits

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Data Clustering with Partial Supervision

Data Mining and Knowledge Discovery
Active semi-supervised fuzzy clustering

Pattern Recognition
Semisupervised Clustering with Metric Learning using Relative Comparisons

IEEE Transactions on Knowledge and Data Engineering
Introduction to Information Retrieval

Introduction to Information Retrieval
An active learning framework for semi-supervised document clustering with language modeling

Data & Knowledge Engineering
Privacy-preserving data publishing for cluster analysis

Data & Knowledge Engineering
Value, cost, and sharing: open issues in constrained clustering

KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases
Measuring constraint-set utility for partitional clustering algorithms

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Semantic Web Content Analysis: A Study in Proximity-Based Collaborative Clustering

IEEE Transactions on Fuzzy Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, the use of background knowledge to improve the data mining process has been intensively studied. Indeed, background knowledge along with knowledge directly or indirectly provided by the user are often available. However, it is often difficult to formalize this kind of knowledge, as it is often dependent of the domain. In this article, we studied the integration of knowledge as labeled objects in clustering algorithms. Several criteria allowing the evaluation of the purity of a clustering are presented and their behaviours are compared using artificial datasets. Advantages and drawbacks of each criterion are analyzed in order to help the user to make a choice among them.