Beyond cluster labeling: Semantic interpretation of clusters' contents using a graph representation

Authors:
François Role;Mohamed Nadif
Affiliations:
-;-
Venue:
Knowledge-Based Systems
Year:
2014

Citing 19
Cited 0

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Concept decompositions for large sparse text data using clustering

Machine Learning
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
Cluster validation techniques for genome expression data

Signal Processing - Special issue: Genomic signal processing
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

The Journal of Machine Learning Research
Automatic labeling of multinomial topic models

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating statistics and visualization: case studies of gaining clarity during exploratory data analysis

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Enhancing cluster labeling using wikipedia

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Generic title labeling for clustered documents

Expert Systems with Applications: An International Journal
Clustered SVD strategies in latent semantic indexing

Information Processing and Management: an International Journal
From frequency to meaning: vector space models of semantics

Journal of Artificial Intelligence Research
W-kmeans: clustering news articles using wordNet

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
Co-clustering under nonnegative matrix tri-factorization

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres

IEEE Transactions on Neural Networks
Selecting labels for news document clusters

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficient clustering algorithms have been developed to automatically group documents into subgroups (clusters). Once clustering has been performed, an important additional step is to help users make sense of the obtained clusters. Existing methods address this issue by assigning to each cluster a flat list of descriptive terms (labels) that are extracted from the documents, most often using statistical techniques borrowed from the field of feature selection or reduction. A limitation of these unstructured descriptions of clusters' contents is that they do not account for the meaningful relationships between the terms. In contrast, we propose a graph representation, which makes the clusters easier to interpret by putting the descriptive terms in context, and by performing some simple network analysis. Our experiments reveal that the proposed method allows for a deeper level of interpretation, both when the clusters under study are homogeneous and when they are heterogeneous. In addition, evaluation procedures presented in the paper show that the graph-based representation of each cluster, while being very synthetic, still quite faithfully reflects the original content of the cluster.