"Piaf" vs "Adele": classifying encyclopedic queries using automatically labeled training data

Authors:
Pedro Saleiro;Luís Sarmento
Affiliations:
DEI-FEUP, University of Porto, Porto, Portugal;LIACC-FEUP, University of Porto, Porto, Portugal
Venue:
Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Year:
2013

Citing 15
Cited 0

Support-Vector Networks

Machine Learning
Random Forests

Machine Learning
A taxonomy of web search

ACM SIGIR Forum
Understanding user goals in web search

Proceedings of the 13th international conference on World Wide Web
Automatic identification of user goals in Web search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Q2C@UST: our winning solution to query classification in KDDCUP 2005

ACM SIGKDD Explorations Newsletter
Determining the informational, navigational, and transactional intent of Web queries

Information Processing and Management: an International Journal
Learning query intent from regularized click graphs

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Understanding user's query intent with wikipedia

Proceedings of the 18th international conference on World wide web
Classifying and Characterizing Query Intent

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Context-aware query classification

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Sparse hidden-dynamics conditional random fields for user intent understanding

Proceedings of the 20th international conference on World wide web
Learning search tasks in queries and web pages via graph regularization

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Scalable multi-dimensional user intent identification using tree structured distributions

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
The intention behind web queries

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Encyclopedic queries express the intent of obtaining information typically available in encyclopedias, such as biographical, geographical or historical facts. In this paper, we train a classifier for detecting the encyclopedic intent of web queries. For training such a classifier, we automatically label training data from raw query logs. We use click-through data to select positive examples of encyclopedic queries as those queries that mostly lead to Wikipedia articles. We investigated a large set of features that can be generated to describe the input query. These features include both term-specific patterns as well as query projections on knowledge bases items (e.g. Freebase). Results show that using these feature sets it is possible to achieve an F1 score above 87%, competing with a Google-based baseline, which uses a much wider set of signals to boost the ranking of Wikipedia for potential encyclopedic queries. The results also show that both query projections on Wikipedia article titles and Freebase entity match represent the most relevant groups of features. When the training set contains frequent positive examples (i.e rare queries are excluded) results tend to improve.