Entity Ranking from Annotated Text Collections Using Multitype Topic Models

Authors:
Hitohiro Shiozaki;Koji Eguchi
Affiliations:
Graduate School of Science and Technology, Kobe University, Kobe, Japan 657-8501;Graduate School of Engineering, Kobe University, Kobe, Japan 657-8501
Venue:
Focused Access to XML Documents
Year:
2008

Citing 12
Cited 0

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A general language model for information retrieval

Proceedings of the eighth international conference on Information and knowledge management
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A Linguistically Motivated Probabilistic Model of Information Retrieval

ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
Latent dirichlet allocation

The Journal of Machine Learning Research
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The Wikipedia XML corpus

ACM SIGIR Forum
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical entity-topic models

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
On GMAP: and other transformations

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Entity network prediction using multitype topic models

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Very recently, topic model-based retrieval methods have produced good results using Latent Dirichlet Allocation (LDA) model or its variants in language modeling framework. However, for the task of retrieving annotated documents when using the LDA-based methods, some post-processing is required outside the model in order to make use of multiple word types that are specified by the annotations. In this paper, we explore new retrieval methods using a `multitype topic model' that can directly handle multiple word types, such as annotated entities, category labels and other words that are typically used in Wikipedia. We investigate how to effectively apply the multitype topic model to retrieve documents from an annotated collection, and show the effectiveness of our methods through experiments on entity ranking using a Wikipedia collection.