Automatic classification of Web queries using very large unlabeled query logs

Authors:
Steven M. Beitzel;Eric C. Jensen;David D. Lewis;Abdur Chowdhury;Ophir Frieder
Affiliations:
Illinois Institute of Technology, Chicago, IL;Illinois Institute of Technology, Chicago, IL;David D. Lewis Consulting, Chicago, IL;Illinois Institute of Technology, Chicago, IL;Illinois Institute of Technology, Chicago, IL
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2007

Citing 42
Cited 27

Elements of information theory

Elements of information theory
Selection and information: a class-based approach to lexical relationships

Selection and information: a class-based approach to lexical relationships
Evaluating and optimizing autonomous text classification systems

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Making large-scale support vector machine learning practical

Advances in kernel methods
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Architecture of a metasearch engine that supports user information needs

Proceedings of the eighth international conference on Information and knowledge management
Real life, real users, and real needs: a study and analysis of user queries on the web

Information Processing and Management: an International Journal
Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A vector space model for automatic indexing

Communications of the ACM
Searching the Web: the public and their queries

Journal of the American Society for Information Science and Technology
Data mining for association rules and sequential patterns: sequential and parallel algorithms

Data mining for association rules and sequential patterns: sequential and parallel algorithms
Clustering user queries of a search engine

Proceedings of the 10th international conference on World Wide Web
Query clustering using content words and user feedback

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Query clustering using user logs

ACM Transactions on Information Systems (TOIS)
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Information Retrieval

Information Retrieval
Machine Learning

Machine Learning
Information Retrieval: Algorithms and Heuristics

Information Retrieval: Algorithms and Heuristics
A critical examination of TDT's cost function

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
From E-Sex to E-Commerce: Web Search Changes

Computer
The Perceptron Algorithm with Uneven Margins

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A taxonomy of web search

ACM SIGIR Forum
U.S. versus European web searching trends

ACM SIGIR Forum
Query type classification for web document retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Coverage, relevance, and ranking: The impact of query operators on Web search engine results

ACM Transactions on Information Systems (TOIS)
Categorizing web queries according to geographical locality

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Operational requirements for scalable search systems

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
What's new on the web?: the evolution of the web from a search engine perspective

Proceedings of the 13th international conference on World Wide Web
Hourly analysis of a very large topically categorized web query log

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A temporal comparison of AltaVista Web searching: Research Articles

Journal of the American Society for Information Science and Technology
Automatic web query classification using labeled and unlabeled training data

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Connecting topics in document collections with stepping stones and pathways

Proceedings of the 14th ACM international conference on Information and knowledge management
Generating better concept hierarchies using automatic document classification

Proceedings of the 14th ACM international conference on Information and knowledge management
Disambiguating Nouns, Verbs, and Adjectives Using Automatically Acquired Selectional Preferences

Computational Linguistics
Improving Automatic Query Classification via Semi-Supervised Learning

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Q2C@UST: our winning solution to query classification in KDDCUP 2005

ACM SIGKDD Explorations Newsletter
The Ferrety algorithm for the KDD Cup 2005 problem

ACM SIGKDD Explorations Newsletter
Classifying search engine queries using the web as background knowledge

ACM SIGKDD Explorations Newsletter
Temporal analysis of a very large topically categorized Web query log

Journal of the American Society for Information Science and Technology
Web Search: Public Searching of the Web (Information Science and Knowledge Management)

Web Search: Public Searching of the Web (Information Science and Knowledge Management)
Analyzing the effect of query class on document retrieval performance

AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence

Determining the informational, navigational, and transactional intent of Web queries

Information Processing and Management: an International Journal
Analysis of varying approaches to topical web query classification

Proceedings of the 3rd international conference on Scalable information systems
The Metadata Triumvirate: Social Annotations, Anchor Texts and Search Queries

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Categorizing web queries

ACM SIGSOFT Software Engineering Notes
Time series analysis of a Web search engine transaction log

Information Processing and Management: an International Journal
Classifying search queries using the Web as a source of knowledge

ACM Transactions on the Web (TWEB)
Quantifying Asymmetric Semantic Relations from Query Logs by Resource Allocation

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Sources of evidence for vertical selection

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Product query classification

Proceedings of the 18th ACM conference on Information and knowledge management
PQC: personalized query classification

Proceedings of the 18th ACM conference on Information and knowledge management
An analysis framework for search sequences

Proceedings of the 18th ACM conference on Information and knowledge management
Commercial Internet filters: Perils and opportunities

Decision Support Systems
Classifying web queries by topic and user intent

CHI '10 Extended Abstracts on Human Factors in Computing Systems
Classification-enhanced ranking

Proceedings of the 19th international conference on World wide web
Mining Query Logs: Turning Search Usage Data into Knowledge

Foundations and Trends in Information Retrieval
Inferring document utility via a decision-making based retrieval model

International Journal of Knowledge-based and Intelligent Engineering Systems
Mining Historic Query Trails to Label Long and Rare Search Engine Queries

ACM Transactions on the Web (TWEB)
Searchable web sites recommendation

Proceedings of the fourth ACM international conference on Web search and data mining
An approach to use query-related web context on document ranking

Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
Exploring wikipedia's category graph for query classification

AIS'11 Proceedings of the Second international conference on Autonomous and intelligent systems
Aggregated search result diversification

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Behavior-driven clustering of queries into topics

Proceedings of the 20th ACM international conference on Information and knowledge management
Which should we try first? ranking information resources through query classification

FQAS'11 Proceedings of the 9th international conference on Flexible Query Answering Systems
Web log analysis: a review of a decade of studies about information acquisition, inspection and interpretation of user interaction

Data Mining and Knowledge Discovery
A feature-free search query classification approach using semantic distance

Expert Systems with Applications: An International Journal
Measuring website similarity using an entity-aware click graph

Proceedings of the 21st ACM international conference on Information and knowledge management
Mining search and browse logs for web search: A Survey

ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers

Quantified Score

Hi-index	0.01

Visualization

Abstract

Accurate topical classification of user queries allows for increased effectiveness and efficiency in general-purpose Web search systems. Such classification becomes critical if the system must route queries to a subset of topic-specific and resource-constrained back-end databases. Successful query classification poses a challenging problem, as Web queries are short, thus providing few features. This feature sparseness, coupled with the constantly changing distribution and vocabulary of queries, hinders traditional text classification. We attack this problem by combining multiple classifiers, including exact lookup and partial matching in databases of manually classified frequent queries, linear models trained by supervised learning, and a novel approach based on mining selectional preferences from a large unlabeled query log. Our approach classifies queries without using external sources of information, such as online Web directories or the contents of retrieved pages, making it viable for use in demanding operational environments, such as large-scale Web search services. We evaluate our approach using a large sample of queries from an operational Web search engine and show that our combined method increases recall by nearly 40% over the best single method while maintaining adequate precision. Additionally, we compare our results to those from the 2005 KDD Cup and find that we perform competitively despite our operational restrictions. This suggests it is possible to topically classify a significant portion of the query stream without requiring external sources of information, allowing for deployment in operationally restricted environments.