Mining Historic Query Trails to Label Long and Rare Search Engine Queries

Authors:
Peter Bailey;Ryen W. White;Han Liu;Giridhar Kumaran
Affiliations:
Microsoft;Microsoft Research;Carnegie Mellon University;Microsoft
Venue:
ACM Transactions on the Web (TWEB)
Year:
2010

Citing 35
Cited 8

Query expansion using lexical-semantic relations

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
TREC and TIPSTER experiments with INQUERY

TREC-2 Proceedings of the second conference on Text retrieval conference
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Rank aggregation methods for the Web

Proceedings of the 10th international conference on World Wide Web
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Using part-of-speech patterns to reduce query ambiguity

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic evaluation of world wide web search services

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Categorizing web queries according to geographical locality

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Simple BM25 extension to multiple weighted fields

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Analysis of topic dynamics in web search

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Using ODP metadata to personalize search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Automatic Query Classification via Semi-Supervised Learning

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
KDD CUP-2005 report: facing a great challenge

ACM SIGKDD Explorations Newsletter
Q2C@UST: our winning solution to query classification in KDDCUP 2005

ACM SIGKDD Explorations Newsletter
The Ferrety algorithm for the KDD Cup 2005 problem

ACM SIGKDD Explorations Newsletter
Classifying search engine queries using the web as background knowledge

ACM SIGKDD Explorations Newsletter
Automatic identification of user interest for personalized search

Proceedings of the 15th international conference on World Wide Web
Improving web search ranking by incorporating user behavior information

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic classification of Web queries using very large unlabeled query logs

ACM Transactions on Information Systems (TOIS)
Investigating behavioral variability in web search

Proceedings of the 16th international conference on World Wide Web
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Robust classification of rare queries using web knowledge

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Hits on the web: how does it compare?

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Understanding the relationship of information need specificity to search query length

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Mining the search trails of surfing crowds: identifying relevant websites from user activity

Proceedings of the 17th international conference on World Wide Web
A comparative study of probabilistic and language models for information retrieval

ADC '08 Proceedings of the nineteenth conference on Australasian database - Volume 75
Effective and efficient user interaction for long queries

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Learning query intent from regularized click graphs

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Discovering key concepts in verbose queries

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Regression Rank: Learning to Meet the Opportunity of Descriptive Queries

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Predicting user interests from contextual information

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Reducing long queries using query quality predictors

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Similarity measures for short segments of text

ECIR'07 Proceedings of the 29th European conference on IR research
Classification-enhanced ranking

Proceedings of the 19th international conference on World wide web

Predicting short-term interests using activity-based search context

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Why searchers switch: understanding and predicting engine switching rationales

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Interactive search support for difficult web queries

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
When web search fails, searchers become askers: understanding the transition

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Role-explicit query identification and intent role annotation

Proceedings of the 21st ACM international conference on Information and knowledge management
Leading people to longer queries

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
On segmentation of eCommerce queries

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines

ACM Transactions on the Web (TWEB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web search engines can perform poorly for long queries (i.e., those containing four or more terms), in part because of their high level of query specificity. The automatic assignment of labels to long queries can capture aspects of a user’s search intent that may not be apparent from the terms in the query. This affords search result matching or reranking based on queries and labels rather than the query text alone. Query labels can be derived from interaction logs generated from many users’ search result clicks or from query trails comprising the chain of URLs visited following query submission. However, since long queries are typically rare, they are difficult to label in this way because little or no historic log data exists for them. A subset of these queries may be amenable to labeling by detecting similarities between parts of a long and rare query and the queries which appear in logs. In this article, we present the comparison of four similarity algorithms for the automatic assignment of Open Directory Project category labels to long and rare queries, based solely on matching against similar satisfied query trails extracted from log data. Our findings show that although the similarity-matching algorithms we investigated have tradeoffs in terms of coverage and accuracy, one algorithm that bases similarity on a popular search result ranking function (effectively regarding potentially-similar queries as “documents”) outperforms the others. We find that it is possible to correctly predict the top label better than one in five times, even when no past query trail exactly matches the long and rare query. We show that these labels can be used to reorder top-ranked search results leading to a significant improvement in retrieval performance over baselines that do not utilize query labeling, but instead rank results using content-matching or click-through logs. The outcomes of our research have implications for search providers attempting to provide users with highly-relevant search results for long queries.