C4.5: programs for machine learning
C4.5: programs for machine learning
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
Frequent term-based text clustering
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A matrix density based algorithm to hierarchically co-cluster documents and words
WWW '03 Proceedings of the 12th international conference on World Wide Web
Ontologies Improve Text Document Clustering
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Interpretable Hierarchical Clustering by Constructing an Unsupervised Decision Tree
IEEE Transactions on Knowledge and Data Engineering
Hierarchical Clustering Algorithms for Document Datasets
Data Mining and Knowledge Discovery
International Journal of Approximate Reasoning
Hi-index | 0.01 |
Clusters of text documents output by clustering algorithms are often hard to interpret. We describe motivating real-world scenarios that necessitate reconfigurability and high interpretability of clusters and outline the problem of generating clusterings with interpretable and reconfigurable cluster models. We develop a clustering algorithm toward the outlined goal of building interpretable and reconfigurable cluster models; it works by generating rules with disjunctions and conditions on the frequencies of words, to decide on the membership of a document to a cluster. Each cluster is comprised of precisely the set of documents that satisfy the corresponding rule. We show that our approach outperforms the unsupervised decision tree approach by huge margins. We show that the purity and f-measure losses to achieve interpretability are as little as 5% and 3% respectively using our approach.