Topic analysis using a finite mixture model

  • Authors:
  • Hang Li;Kenji Yamanishi

  • Affiliations:
  • Microsoft Research Asia, 5F Sigma Center, No. 49 Zhichun Road, Haidian District, Beijing, China and Internet Systems Research Laboratories, NEC Corporation, 4-1-1 Miyazaki, Miyamae-ku, Kawasaki 21 ...;Internet Systems Research Laboratories, NEC Corporation, 4-1-1 Miyazaki, Miyamae-ku, Kawasaki 216-855, Japan

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Addressed here is the issue of 'topic analysis' which is used to determine a text's topic structure, a representation indicating what topics are included in a text and how those topics change within the text. Topic analysis consists of two main tasks: topic identification and text segmentation. While topic analysis would be extremely useful in a variety of text processing applications, no previous study has so far sufficiently addressed it. A statistical learning approach to the issue is proposed in this paper. More specifically, topics here are represented by means of word clusters, and a finite mixture model, referred to as a stochastic topic model (STM), is employed to represent a word distribution within a text. In topic analysis, a given text is segmented by detecting significant differences between STMs, and topics are identified by means of estimation of STMs. Experimental results indicate that the proposed method significantly outperforms methods that combine existing techniques.