Discovering missing click-through query language information for web search

Authors:
Xing Yi;James Allan
Affiliations:
University of Massachusetts, Amherst, AMHERST, MA, USA;University of Massachusetts, Amherst, AMHERST, MA, USA
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 25
Cited 1

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Combining document representations for known-item search

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
An Overview of the INQUERY System as Used for the TIPSTER Project

An Overview of the INQUERY System as Used for the TIPSTER Project
Relevant query feedback in statistical language modeling

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Optimizing web search using web click-through data

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Improving web search ranking by incorporating user behavior information

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Improving the estimation of relevance models using large external corpora

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Random walks on the click graph

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Active exploration for learning rankings from clickthrough data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning query intent from regularized click graphs

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A general optimization framework for smoothing language models on graph structures

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Mining term association patterns from search logs for effective query reformulation

Proceedings of the 17th ACM conference on Information and knowledge management
Analysis of long queries in a large scale search log

Proceedings of the 2009 workshop on Web Search Click Data
Smoothing clickthrough data for web search ranking

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
The importance of anchor text for ad hoc search revisited

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A content based approach for discovering missing anchor text for web search

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Smoothing click counts for aggregated vertical search

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval

Modeling semantic and behavioral relations for query suggestion

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The click-through information in web query logs has been widely used for web search tasks. However, it usually suffers from the data sparseness problem, known as the missing/incomplete click problems, where large volume of pages receive few or no clicks. In this paper, we adapt two language modeling based approaches to address this issue in the context of using web query logs for web search. The first approach discovers missing click-through query language features for web pages with no or few clicks from their similar pages' click-associated queries in the query logs, to help search. We further propose combining this content based approach with the random walk approach on the click graph to further reduce click-through sparseness for search. The second approach follows the query expansion method and utilizes the queries and their clicked web pages in the query logs to reconstruct a structured variant of the relevance based language models for each user-input query for search. We design experiments with a publicly available query log excerpt and two TREC web search tasks on the GOV2 and ClueWeb09 corpora to evaluate the search performance of different approaches. Our results show that using discovered semantic click-through query language features can statistically significantly improve search performance, compared with the baselines that do not use the discovered information. The combination approach that uses discovered click-through features from both random walk and the content based approach can further improve search performance.