Injecting Structured Data to Generative Topic Model in Enterprise Settings

  • Authors:
  • Han Xiao;Xiaojie Wang;Chao Du

  • Affiliations:
  • Technische Universität München, Garching bei München, Germany D-85748 and Beijing University of Posts and Telecommunications, Beijing, China 100876;Beijing University of Posts and Telecommunications, Beijing, China 100876;Beihang University, Beijing, China 100191

  • Venue:
  • ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Enterprises have accumulated both structured and unstructured data steadily as computing resources improve. However, previous research on enterprise data mining often treats these two kinds of data independently and omits mutual benefits. We explore the approach to incorporate a common type of structured data (i.e. organigram) into generative topic model. Our approach, the Partially Observed Topic model (POT), not only considers the unstructured words, but also takes into account the structured information in its generation process. By integrating the structured data implicitly, the mixed topics over document are partially observed during the Gibbs sampling procedure. This allows POT to learn topic pertinently and directionally, which makes it easy tuning and suitable for end-use application. We evaluate our proposed new model on a real-world dataset and show the result of improved expressiveness over traditional LDA. In the task of document classification, POT also demonstrates more discriminative power than LDA.