Semi-supervised constrained clustering: an expert-guided data analysis methodology

Authors:
Vid Podpečan;Miha Grčar;Nada Lavrač
Affiliations:
Jožef Stefan Institute, Ljubljana, Slovenia;Jožef Stefan Institute, Ljubljana, Slovenia;Jožef Stefan Institute, Ljubljana, Slovenia and University of Nova Gorica, Nova Gorica, Slovenia
Venue:
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Year:
2010

Citing 9
Cited 0

FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Self-organizing maps

Self-organizing maps
Algorithm 583: LSQR: Sparse Linear Equations and Least Squares Problems

ACM Transactions on Mathematical Software (TOMS)
Constraint-based clustering in large databases

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Clustering with Instance-level Constraints

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Least-Squares Meshes

SMI '04 Proceedings of the Shape Modeling International 2004
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Visual Mapping of Text Collections through a Fast High Precision Projection Technique

IV '06 Proceedings of the conference on Information Visualization
Graph drawing by stress majorization

GD'04 Proceedings of the 12th international conference on Graph Drawing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a methodology for expert-guided analysis of large data sets, including large text corpora. Its main ingredient is the algorithm for semi-supervised data clustering using cluster size constraints which implements several improvements over existing k-means constrained clustering algorithms. First, it allows for a larger set of userdefined cluster size constraints of different types (lower- and upper-bound constraints). Second, it allows for dynamic re-assignment of predefined constraints to clusters in iterative cluster computation/optimization, thus improving the results of constrained clustering. Third, it allows for expert-guided cluster optimization achieved by combining constrained clustering and data visualization, which enables finer-grained expert's control over the clustering process, leading to further improvements of the quality of obtained clustering solutions. Incorporating data visualization into the clustering process allows the user to select referential points which act as constraint anchors in the course of iterative cluster computation. The proposed semi-supervised constrained clustering methodology has been implemented using a service-oriented data mining environment Orange4WS and evaluated on different document corpora.