A graph-based topic extraction method enabling simple interactive customization

Authors:
Ajitesh Srivastava;Axel J. Soto;Evangelos Milios
Affiliations:
Birla Institute of Technology and Science, Pilani, India;Dalhousie University, Halifax, NS, Canada;Dalhousie University, Halifax, NS, Canada
Venue:
Proceedings of the 2013 ACM symposium on Document engineering
Year:
2013

Citing 18
Cited 0

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
A Visual Analytics Agenda

IEEE Computer Graphics and Applications
Email Surveillance Using Non-negative Matrix Factorization

Computational & Mathematical Organization Theory
Pachinko allocation: DAG-structured mixture models of topic correlations

ICML '06 Proceedings of the 23rd international conference on Machine learning
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A Simple Yet Effective Data Clustering Algorithm

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Introduction to Information Retrieval

Introduction to Information Retrieval
Semi-supervised multi-label learning by constrained non-negative matrix factorization

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing

ACM Transactions on Intelligent Systems and Technology (TIST)
Comparing twitter and traditional media using topic models

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Subspace mapping of noisy text documents

Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Termite: visualization techniques for assessing textual topic models

Proceedings of the International Working Conference on Advanced Visual Interfaces
Personalized document clustering with dual supervision

Proceedings of the 2012 ACM symposium on Document engineering
Nonnegative Matrix Factorization: A Comprehensive Review

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is often desirable to identify the concepts that are present in a corpus. A popular way to deal with this objective is to discover clusters of words or topics, for which many algorithms exist in the literature. Yet most of these methods lack the interpretability that would enable interaction with a user not familiar with their inner workings. The paper proposes a graph-based topic extraction algorithm, which can also be viewed as a soft-clustering of words present in a given corpus. Each topic, in the form of a set of words, represents an underlying concept in the corpus. The method allows easy interpretation of the clustering process, and hence enables the scope of user involvement at various steps. For a quantitative evaluation of the topics extracted, we use them as features to get a compact representation of documents for classification tasks. We compare the classification accuracy achieved by a reduced feature set obtained with our method versus other topic extraction techniques, namely Latent Dirichlet Allocation and Non-negative Matrix Factorization. While the results from all the three algorithms are comparable, the speed and easy interpretability of our algorithm makes it more appropriate to be used interactively by lay users.