Analysis of varying approaches to topical web query classification

Authors:
Steven M. Beitzel;Eric C. Jensen;Abdur Chowdhury;Ophir Frieder
Affiliations:
Telcordia Technologies, Inc.;Summize, Inc.;Summize, Inc.;Illinois Institute of Technology
Venue:
Proceedings of the 3rd international conference on Scalable information systems
Year:
2008

Citing 18
Cited 2

Evaluating and optimizing autonomous text classification systems

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Making large-scale support vector machine learning practical

Advances in kernel methods
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
A taxonomy of web search

ACM SIGIR Forum
Query type classification for web document retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Categorizing web queries according to geographical locality

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Automatic web query classification using labeled and unlabeled training data

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Automatic Query Classification via Semi-Supervised Learning

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
KDD CUP-2005 report: facing a great challenge

ACM SIGKDD Explorations Newsletter
Q2C@UST: our winning solution to query classification in KDDCUP 2005

ACM SIGKDD Explorations Newsletter
The Ferrety algorithm for the KDD Cup 2005 problem

ACM SIGKDD Explorations Newsletter
Building bridges for web query classification

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Temporal analysis of a very large topically categorized Web query log

Journal of the American Society for Information Science and Technology
Working Set Selection Using Second Order Information for Training Support Vector Machines

The Journal of Machine Learning Research
Automatic classification of Web queries using very large unlabeled query logs

ACM Transactions on Information Systems (TOIS)
Generalized Bradley-Terry Models and Multi-Class Probability Estimates

The Journal of Machine Learning Research

Query classification based on index association rule expansion

WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Towards the taxonomy-oriented categorization of yellow pages queries

ACM Transactions on Internet Technology (TOIT)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Topical classification of web queries has drawn recent interest from forums such as the 2005 KDD Cup because of the promise it offers in improving retrieval effectiveness and efficiency. Many proposed techniques make use of documents classified in taxonomies (such as the ODP: Open Directory Project -- http://www.dmoz.org) to inform on the class of a web query. Implicit in these approaches is the assumption that topically classifying queries is equivalent to the general topical text classification task (although with few directly available features from such short queries). We test this assumption by comparing and combining classifiers trained directly from manually classified queries and their retrieved documents, trained from categorized documents in the ODP, and induced from unlabeled query logs for pre-retrieval classification. We find that training classifiers directly from manually classified queries outperforms the best general topical classifier by 48% in relative F1 score. We attribute this to a mismatch in task when applying a general classifier to queries. For example, a typically vague web query classified as "business" is likely to retrieve documents classified as "news" and "organizations" in addition to those labeled "business." Equating a "business" class of queries with a "business" class of documents, then, is not appropriate.