Learning topical transition probabilities in click through data with regression models

Authors:
Xiao Zhang;Prasenjit Mitra
Affiliations:
The Pennsylvania State University;The Pennsylvania State University
Venue:
Procceedings of the 13th International Workshop on the Web and Databases
Year:
2010

Citing 13
Cited 0

Characterizing browsing strategies in the World-Wide Web

Proceedings of the Third International World-Wide Web conference on Technology, tools and applications
Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
Analysis of a very large web search engine query log

ACM SIGIR Forum
Multitasking information seeking and searching processes

Journal of the American Society for Information Science and Technology
Identifying Web Browsing Trends and Patterns

Computer
Combining evidence for automatic web session identification

Information Processing and Management: an International Journal - Issues of context in information retrieval
Using terminological feedback for web search refinement: a log-based study

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Query chains: learning to rank from implicit feedback

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Defining a session on Web search engines: Research Articles

Journal of the American Society for Information Science and Technology
Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs

Proceedings of the 17th ACM conference on Information and knowledge management
Multitasking during Web search sessions

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The transition of search engine users' intents has been studied for a long time. The knowledge of intent transition, once discovered, can yield a better understanding of how different topics are related and be used in many applications, such as building recommender systems, ranking and etc. In this paper, we study the problem of finding the transition probabilities of digital library users' intents among different topics. We use the click-through data from CiteSeerX and extract the click chains. Each document in the click chain is represented by a topical vector generated by LDA models. We then model the task of finding the topical transition probabilities as a multiple output linear regression problem, in which the input and output are two consecutive topical vectors in the click chain and the elements in the weight matrix correspond to the transition probabilities. Given the constraints of our task, we propose a new algorithm based on the exponentiated gradient. Our algorithm provides a good interpretability as well as a small sum-of-squares error comparable to existing regression methods. We are particular interested in the off-diagonal elements of the learned weight matrix since they represent the transition probabilities of different topics. The authors' interpretation of these transitions are given at the end of the paper.