TopicTiling: a text segmentation algorithm based on LDA

  • Authors:
  • Martin Riedl;Chris Biemann

  • Affiliations:
  • Technische Universität Darmstadt Hochschulstrasse, Darmstadt, Germany;Technische Universität Darmstadt Hochschulstrasse, Darmstadt, Germany

  • Venue:
  • ACL '12 Proceedings of ACL 2012 Student Research Workshop
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work presents a Text Segmentation algorithm called TopicTiling. This algorithm is based on the well-known TextTiling algorithm, and segments documents using the Latent Dirichlet Allocation (LDA) topic model. We show that using the mode topic ID assigned during the inference method of LDA, used to annotate unseen documents, improves performance by stabilizing the obtained topics. We show significant improvements over state of the art segmentation algorithms on two standard datasets. As an additional benefit, TopicTiling performs the segmentation in linear time and thus is computationally less expensive than other LDA-based segmentation methods.