Using word-sense disambiguation methods to classify web queries by intent

Authors:
Emily Pitler;Ken Church
Affiliations:
University of Pennsylvania, Philadelphia, PA;Johns Hopkins University, Baltimore, MD
Venue:
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Year:
2009

Citing 26
Cited 4

Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A taxonomy of web search

ACM SIGIR Forum
Robust and flexible mixed-initiative dialogue for telephone services

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Decision lists for lexical ambiguity resolution: application to accent restoration in Spanish and French

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Understanding user goals in web search

Proceedings of the 13th international conference on World Wide Web
Hourly analysis of a very large topically categorized web query log

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Optimizing web search using web click-through data

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Automatic identification of user goals in Web search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Improving Automatic Query Classification via Semi-Supervised Learning

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Detecting online commercial intention (OCI)

Proceedings of the 15th international conference on World Wide Web
Mining long-term search history to improve search accuracy

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Robust classification of rare queries using web knowledge

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Random walks on the click graph

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Predictive user click models based on click-through history

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Contextual advertising by combining relevance with click feedback

Proceedings of the 17th international conference on World Wide Web
Video suggestion and discovery for youtube: taking random walks through the view graph

Proceedings of the 17th international conference on World Wide Web
Simrank++: query rewriting through link analysis of the clickgraph (poster)

Proceedings of the 17th international conference on World Wide Web
To personalize or not to personalize: modeling queries with variation in user intent

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Learning query intent from regularized click graphs

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
Understanding the relationship between searchers' queries and information goals

Proceedings of the 17th ACM conference on Information and knowledge management
Proceedings of the 2009 workshop on Web Search Click Data

Second ACM International Conference on Web Search and Web Data Mining
Optimizing dialogue management with reinforcement learning: experiments with the NJFun system

Journal of Artificial Intelligence Research

Scalable multi-dimensional user intent identification using tree structured distributions

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
How many multiword expressions do people know?

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Personalization of search profile using ant foraging approach

ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part IV
Deriving query intents from web search engine queries

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Three methods are proposed to classify queries by intent (CQI), e.g., navigational, informational, commercial, etc. Following mixed-initiative dialog systems, search engines should distinguish navigational queries where the user is taking the initiative from other queries where there are more opportunities for system initiatives (e.g., suggestions, ads). The query intent problem has a number of useful applications for search engines, affecting how many (if any) advertisements to display, which results to return, and how to arrange the results page. Click logs are used as a substitute for annotation. Clicks on ads are evidence for commercial intent; other types of clicks are evidence for other intents. We start with a simple Naïve Bayes baseline that works well when there is plenty of training data. When training data is less plentiful, we back off to nearby URLs in a click graph, using a method similar to Word-Sense Disambiguation. Thus, we can infer that designer trench is commercial because it is close to www.saksfifthavenue.com, which is known to be commercial. The baseline method was designed for precision and the backoff method was designed for recall. Both methods are fast and do not require crawling webpages. We recommend a third method, a hybrid of the two, that does no harm when there is plenty of training data, and generalizes better when there isn't, as a strong baseline for the CQI task.