SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Q2C@UST: our winning solution to query classification in KDDCUP 2005
ACM SIGKDD Explorations Newsletter
To personalize or not to personalize: modeling queries with variation in user intent
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Query-log mining for detecting spam
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Proceedings of the forty-first annual ACM symposium on Theory of computing
Estimating query performance using class predictions
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Intent based clustering of search engine query log
CASE'09 Proceedings of the fifth annual IEEE international conference on Automation science and engineering
Coniunge et impera: multiple-graph mining for query-log analysis
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
A Case Study of Collaboration and Reputation in Social Web Search
ACM Transactions on Intelligent Systems and Technology (TIST)
Predicting query performance via classification
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Click patterns: an empirical representation of complex query intents
Proceedings of the 21st ACM international conference on Information and knowledge management
Intent models for contextualising and diversifying query suggestions
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
It is widely believed that some queries submitted to search engines are by nature ambiguous (e.g., java, apple). However, few studies have investigated the questions of "how many queries are ambiguous?" and "how can we automatically identify an ambiguous query?" This paper deals with these issues. First, we construct the taxonomy of query ambiguity, and ask human annotators to manually classify queries based upon it. From manually labeled results, we find that query ambiguity is to some extent predictable. We then use a supervised learning approach to automatically classify queries as being ambiguous or not. Experimental results show that we can correctly identify 87% of labeled queries. Finally, we estimate that about 16% of queries in a real search log are ambiguous.