TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling

  • Authors:
  • Brynjar Gretarsson;John O’Donovan;Svetlin Bostandjiev;Tobias Höllerer;Arthur Asuncion;David Newman;Padhraic Smyth

  • Affiliations:
  • University of California Santa Barbara;University of California Santa Barbara;University of California Santa Barbara;University of California Santa Barbara;University of California Irvine;University of California Irvine;University of California Irvine

  • Venue:
  • ACM Transactions on Intelligent Systems and Technology (TIST)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present TopicNets, a Web-based system for visual and interactive analysis of large sets of documents using statistical topic models. A range of visualization types and control mechanisms to support knowledge discovery are presented. These include corpus- and document-specific views, iterative topic modeling, search, and visual filtering. Drill-down functionality is provided to allow analysts to visualize individual document sections and their relations within the global topic space. Analysts can search across a dataset through a set of expansion techniques on selected document and topic nodes. Furthermore, analysts can select relevant subsets of documents and perform real-time topic modeling on these subsets to interactively visualize topics at various levels of granularity, allowing for a better understanding of the documents. A discussion of the design and implementation choices for each visual analysis technique is presented. This is followed by a discussion of three diverse use cases in which TopicNets enables fast discovery of information that is otherwise hard to find. These include a corpus of 50,000 successful NSF grant proposals, 10,000 publications from a large research center, and single documents including a grant proposal and a PhD thesis.