Query classification using Wikipedia

  • Authors:
  • Richard Khoury

  • Affiliations:
  • Department of Software Engineering, Lakehead University, 955 Oliver Road, Thunder Bay, Ontario, P7B 5E1, Canada

  • Venue:
  • International Journal of Intelligent Information and Database Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Identifying the intended topic that underlies a user's query can benefit a large range of applications, from search engines to question-answering systems. However, query classification remains a difficult challenge due to the variety of queries a user can ask, the wide range of topics users can ask about, and the limited amount of information that can be mined from the query. In this paper, we develop a new query classification system that accounts for these three challenges. Our system relies on the freely-available online encyclopedia Wikipedia as a natural-language knowledge-based, and exploits Wikipedia's structure to infer the correct classification of any given query. We will present two variants of this query classification system in this paper, and demonstrate their reliability compared to each other and to the literature benchmarks using the query sets from the KDD CUP 2005 and TREC 2007 competitions.