Classifying search queries using the Web as a source of knowledge

Authors:
Evgeniy Gabrilovich;Andrei Broder;Marcus Fontoura;Amruta Joshi;Vanja Josifovski;Lance Riedel;Tong Zhang
Affiliations:
Yahoo Research, Santa Clara, CA;Yahoo Research, Santa Clara, CA;PUC-Rio, Rio de Janeiro, Brazil;UCLA, Los Angeles, CA;Yahoo Research, Santa Clara, CA;Yahoo Research, Santa Clara, CA;Rutgers University, Piscataway, NJ
Venue:
ACM Transactions on the Web (TWEB)
Year:
2009

Citing 29
Cited 8

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Query expansion using lexical-semantic relations

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving automatic query expansion

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Improving the effectiveness of information retrieval with local context analysis

ACM Transactions on Information Systems (TOIS)
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Text Categorization Based on Regularized Linear Classification Methods

Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Categorizing web queries according to geographical locality

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Hourly analysis of a very large topically categorized web query log

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Search Engine Marketing, Inc.: Driving Search Traffic to Your Company's Web Site

Search Engine Marketing, Inc.: Driving Search Traffic to Your Company's Web Site
Automatic web query classification using labeled and unlabeled training data

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Taxonomies by the numbers: building high-performance taxonomies

Proceedings of the 14th ACM international conference on Information and knowledge management
Improving Automatic Query Classification via Semi-Supervised Learning

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
KDD CUP-2005 report: facing a great challenge

ACM SIGKDD Explorations Newsletter
Q2C@UST: our winning solution to query classification in KDDCUP 2005

ACM SIGKDD Explorations Newsletter
The Ferrety algorithm for the KDD Cup 2005 problem

ACM SIGKDD Explorations Newsletter
Classifying search engine queries using the web as background knowledge

ACM SIGKDD Explorations Newsletter
Building bridges for web query classification

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Query enrichment for web-query classification

ACM Transactions on Information Systems (TOIS)
Coupling feature selection and machine learning methods for navigational query identification

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Automatic classification of Web queries using very large unlabeled query logs

ACM Transactions on Information Systems (TOIS)
Robust classification of rare queries using web knowledge

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization

The Journal of Machine Learning Research
Introduction to Information Retrieval

Introduction to Information Retrieval
Search advertising using web relevance feedback

Proceedings of the 17th ACM conference on Information and knowledge management
Online expansion of rare queries for sponsored search

Proceedings of the 18th international conference on World wide web
Analyzing the effect of query class on document retrieval performance

AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence

Classification-enhanced ranking

Proceedings of the 19th international conference on World wide web
Do you want to take notes?: identifying research missions in Yahoo! search pad

Proceedings of the 19th international conference on World wide web
Towards mining replacement queries for hard-to-retrieve traces

Proceedings of the IEEE/ACM international conference on Automated software engineering
Semantic tags generation and retrieval for online advertising

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
From exploratory search to web search and back

PIKM '10 Proceedings of the 3rd workshop on Ph.D. students in information and knowledge management
What are the real differences of children's and adults' web search

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Autonomous and adaptive identification of topics in unstructured text

KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part II
Query classification based on index association rule expansion

WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a methodology for building a robust query classification system that can identify thousands of query classes, while dealing in real time with the query volume of a commercial Web search engine. We use a pseudo relevance feedback technique: given a query, we determine its topic by classifying the Web search results retrieved by the query. Motivated by the needs of search advertising, we primarily focus on rare queries, which are the hardest from the point of view of machine learning, yet in aggregate account for a considerable fraction of search engine traffic. Empirical evaluation confirms that our methodology yields a considerably higher classification accuracy than previously reported. We believe that the proposed methodology will lead to better matching of online ads to rare queries and overall to a better user experience.