TopicExplorer: exploring document collections with topic models

  • Authors:
  • Alexander Hinneburg;Rico Preiss;René Schröder

  • Affiliations:
  • Informatik, Martin-Luther-University Halle-Wittenberg, Halle, Germany;Informatik, Martin-Luther-University Halle-Wittenberg, Halle, Germany;Informatik, Martin-Luther-University Halle-Wittenberg, Halle, Germany

  • Venue:
  • ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The demo presents a prototype --- called TopicExplorer --- that combines topic modeling, key word search and visualization techniques to explore a large collection of Wikipedia documents. Topics derived by Latent Dirichlet Allocation are presented by top words. In addition, topics are accompanied by image thumbnails extracted from related Wikipedia documents to aid sense making of derived topics during browsing. Topics are shown in a linear order such that similar topics are close. Topics are mapped to color using that order. The auto-completion of search terms suggests words together with their color coded topics, which allows to explore the relation between search terms and topics. Retrieved documents are shown with color coded topics as well. Relevant documents and topics found during browsing can be put onto a shortlist. The tool can recommend further documents with respect to the average topic mixture of the shortlist.