Itemset classified clustering

Authors:
Jun Sese;Shinichi Morishita
Affiliations:
Graduate School of Information Science and Technology, University of Tokyo;University of Tokyo and Institute for Bioinformatics and Research and Development, Japan Science and Technology Corporation
Venue:
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Year:
2004

Citing 0
Cited 4

Cluster-grouping: from subgroup discovery to clustering

Machine Learning
Integer linear programming models for constrained clustering

DS'10 Proceedings of the 13th international conference on Discovery science
Inductive querying for discovering subgroups and clusters

Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
Learning predictive clustering rules

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering results could be comprehensible and usable if individual groups are associated with characteristic descriptions. However, characterization of clusters followed by clustering may not always produce clusters associated with special features, because the first clustering process and the second classification step are done independently, demanding an elegant way that combines clustering and classification and executes both simultaneously.In this paper, we focus on itemsets as the feature for characterizing groups, and present a technique called itemset classified clustering, which divides data into groups given the restriction that only divisions expressed using a common itemset are allowed and computes the optimal itemset maximizing the interclass variance between the groups. Although this optimization problem is generally intractable, we develop techniques that effectively prune the search space and efficiently compute optimal solutions in practice. We remark that itemset classified clusters are likely to be overlooked by traditional clustering algorithms such as two-clustering or k-means, and demonstrate the scalability of our algorithm with respect to the amount of data by the application of our method to real biological datasets.