Entity Ranking from Annotated Text Collections Using Multitype Topic Models

  • Authors:
  • Hitohiro Shiozaki;Koji Eguchi

  • Affiliations:
  • Graduate School of Science and Technology, Kobe University, Kobe, Japan 657-8501;Graduate School of Engineering, Kobe University, Kobe, Japan 657-8501

  • Venue:
  • Focused Access to XML Documents
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Very recently, topic model-based retrieval methods have produced good results using Latent Dirichlet Allocation (LDA) model or its variants in language modeling framework. However, for the task of retrieving annotated documents when using the LDA-based methods, some post-processing is required outside the model in order to make use of multiple word types that are specified by the annotations. In this paper, we explore new retrieval methods using a `multitype topic model' that can directly handle multiple word types, such as annotated entities, category labels and other words that are typically used in Wikipedia. We investigate how to effectively apply the multitype topic model to retrieve documents from an annotated collection, and show the effectiveness of our methods through experiments on entity ranking using a Wikipedia collection.