SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Self-organizing maps
Algorithm 583: LSQR: Sparse Linear Equations and Least Squares Problems
ACM Transactions on Mathematical Software (TOMS)
Constraint-based clustering in large databases
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Clustering with Instance-level Constraints
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
SMI '04 Proceedings of the Shape Modeling International 2004
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Visual Mapping of Text Collections through a Fast High Precision Projection Technique
IV '06 Proceedings of the conference on Information Visualization
Graph drawing by stress majorization
GD'04 Proceedings of the 12th international conference on Graph Drawing
Hi-index | 0.00 |
This paper presents a methodology for expert-guided analysis of large data sets, including large text corpora. Its main ingredient is the algorithm for semi-supervised data clustering using cluster size constraints which implements several improvements over existing k-means constrained clustering algorithms. First, it allows for a larger set of userdefined cluster size constraints of different types (lower- and upper-bound constraints). Second, it allows for dynamic re-assignment of predefined constraints to clusters in iterative cluster computation/optimization, thus improving the results of constrained clustering. Third, it allows for expert-guided cluster optimization achieved by combining constrained clustering and data visualization, which enables finer-grained expert's control over the clustering process, leading to further improvements of the quality of obtained clustering solutions. Incorporating data visualization into the clustering process allows the user to select referential points which act as constraint anchors in the course of iterative cluster computation. The proposed semi-supervised constrained clustering methodology has been implemented using a service-oriented data mining environment Orange4WS and evaluated on different document corpora.