Using Language Models and Topic Models for XML Retrieval

Authors:
Fang Huang
Affiliations:
School of Computing, The Robert Gordon University, Scotland
Venue:
Focused Access to XML Documents
Year:
2008

Citing 6
Cited 0

A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modeling annotated data

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
XML retrieval: what to retrieve?

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Parameter estimation for a simple hierarchical generative model for XML retrieval

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
GPX: gardens point XML IR at INEX 2005

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper exposes the results of our participation in the INEX 2007 ad hoc track. We implemented two different models: a mixture language model and a topic model. For the language model, we focused on the question of how shallow features of text display information in an XML document can be used to enhance retrieval effectiveness. Our language model combined estimates based on element full-text and the compact representation of the element. We also used non-content priors, including the location the element appears in the original document, and the length of the element path, to boost retrieval effectiveness. For the topic model, we looked at a recent statistical model called Latent Dirichlet Allocation[1], and explored how it could be applied to XML retrieval.