A feature-free search query classification approach using semantic distance

Authors:
Lin Li;Luo Zhong;Guandong Xu;Masaru Kitsuregawa
Affiliations:
School of Computer Science & Technology, Wuhan University of Technology, China;School of Computer Science & Technology, Wuhan University of Technology, China;Centre for Applied Informatics, Victoria University, Australia;Institute of Industrial Science, University of Tokyo, Japan
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 35
Cited 1

Real life information retrieval: a study of user queries on the Web

ACM SIGIR Forum
Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Community search assistant

Proceedings of the 6th international conference on Intelligent user interfaces
Query clustering using user logs

ACM Transactions on Information Systems (TOIS)
Subject categorization of query terms for exploring Web users' search interests

Journal of the American Society for Information Science and Technology
A taxonomy of web search

ACM SIGIR Forum
Query type classification for web document retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Categorizing web queries according to geographical locality

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Using the web to obtain frequencies for unseen bigrams

Computational Linguistics - Special issue on web as corpus
Understanding user goals in web search

Proceedings of the 13th international conference on World Wide Web
Hourly analysis of a very large topically categorized web query log

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Learning by googling

ACM SIGKDD Explorations Newsletter
Automatic identification of user goals in Web search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Detecting dominant locations from search queries

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
KDD CUP-2005 report: facing a great challenge

ACM SIGKDD Explorations Newsletter
The Ferrety algorithm for the KDD Cup 2005 problem

ACM SIGKDD Explorations Newsletter
Classifying search engine queries using the web as background knowledge

ACM SIGKDD Explorations Newsletter
Mining search engine query logs for query recommendation

Proceedings of the 15th international conference on World Wide Web
Building bridges for web query classification

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Query enrichment for web-query classification

ACM Transactions on Information Systems (TOIS)
Automatic classification of Web queries using very large unlabeled query logs

ACM Transactions on Information Systems (TOIS)
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Using Google distance to weight approximate ontology matches

Proceedings of the 16th international conference on World Wide Web
Determining the user intent of web search engine queries

Proceedings of the 16th international conference on World Wide Web
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Robust classification of rare queries using web knowledge

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Improving search engines by query clustering

Journal of the American Society for Information Science and Technology
Mining related queries from Web search engine query logs using an improved association rule mining model

Journal of the American Society for Information Science and Technology
Modeling anchor text and classifying queries to enhance web document retrieval

Proceedings of the 17th international conference on World Wide Web
Spatial variation in search engine queries

Proceedings of the 17th international conference on World Wide Web
Two novel feature selection approaches for web page classification

Expert Systems with Applications: An International Journal
Threshold selection for web-page classification with highly skewed class distribution

Proceedings of the 18th international conference on World wide web
Improving web page classification by label-propagation over click graphs

Proceedings of the 18th ACM conference on Information and knowledge management
A Web page classification system based on a genetic algorithm using tagged-terms as features

Expert Systems with Applications: An International Journal
Analyzing the effect of query class on document retrieval performance

AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence

Collaborative pseudo-relevance feedback

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

When classifying search queries into a set of target categories, machine learning based conventional approaches usually make use of external sources of information to obtain additional features for search queries and training data for target categories. Unfortunately, these approaches rely on large amount of training data for high classification precision. Moreover, they are known to suffer from inability to adapt to different target categories which may be caused by the dynamic changes observed in both Web topic taxonomy and Web content. In this paper, we propose a feature-free classification approach using semantic distance. We analyze queries and categories themselves and utilizes the number of Web pages containing both a query and a category as a semantic distance to determine their similarity. The most attractive feature of our approach is that it only utilizes the Web page counts estimated by a search engine to provide the search query classification with respectable accuracy. In addition, it can be easily adaptive to the changes in the target categories, since machine learning based approaches require extensive updating process, e.g., re-labeling outdated training data, re-training classifiers, to name a few, which is time consuming and high-cost. We conduct experimental study on the effectiveness of our approach using a set of rank measures and show that our approach performs competitively to some popular state-of-the-art solutions which, however, frequently use external sources and are inherently insufficient in flexibility.