Unsupervised latent concept modeling to identify query facets

  • Authors:
  • Romain Deveaud;Eric SanJuan;Patrice Bellot

  • Affiliations:
  • University of Avignon - LIA, Avignon, France;University of Avignon - LIA, Avignon, France;Aix-Marseille University - LSIS, Marseille, France

  • Venue:
  • Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Translating an information need into a keyword query can be a complex cognitive process which often results in under-specification. Retrieving documents based solely on keywords can lead the user to browse documents that do not address the specific query facets she was looking for. We introduce an unsupervised method for mining and modeling latent search concepts in order to increase the coverage of these facets. We use Latent Dirichlet Allocation (LDA), a generative probabilistic topic model, to exhibit highly-specific query-related topics from pseudo-relevant feedback documents. We define these topics as the latent concepts of the user query. The main strength of our approach is that it automatically estimates the number of latent concepts as well as the needed amount of feedback documents, without any prior training step. We evaluate our approach over two large ad-hoc TREC collections, and results show that our approach significantly improves document retrieval effectiveness and even provides a better representation of the information need than the original query.