Utilizing variability of time and term content, within and across users in session detection

Authors:
Shuqi Sun;Sheng Li;Muyun Yang;Haoliang Qi;Tiejun Zhao
Affiliations:
Harbin Institute of Technology;Harbin Institute of Technology;Harbin Institute of Technology;Heilongjiang Institute of Technology;Harbin Institute of Technology
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Year:
2010

Citing 11
Cited 0

Analysis of a very large web search engine query log

ACM SIGIR Forum
Combining evidence for automatic web session identification

Information Processing and Management: an International Journal - Issues of context in information retrieval
Query chains: learning to rank from implicit feedback

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Automatic new topic identification using multiple linear regression

Information Processing and Management: an International Journal
A picture of search

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Defining a session on Web search engines: Research Articles

Journal of the American Society for Information Science and Technology
Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs

Proceedings of the 17th ACM conference on Information and knowledge management
On the Combination of Logistic Regression and Local Probability Estimates

BROADCOM '08 Proceedings of the 2008 Third International Conference on Broadband Communications, Information Technology & Biomedical Applications
A survey on session detection methods in query logs and a proposal for future evaluation

Information Sciences: an International Journal
Models of searching and browsing: languages, studies, and applications

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Markovian analysis for automatic new topic identification in search engine transaction logs

Applied Stochastic Models in Business and Industry

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe a SVM classification framework of session detection task on both Chinese and English query logs. With eight features on the aspects of temporal and content information extracted from pairs of successive queries, the classification models achieve significantly superior performance than the stat-of-the-art method. Additionally, we find through ROC analysis that there exists great discrimination power variability among different features and within the same feature across different users. To fully utilize this variability, we build local models for individual users and combine their predictions with those from the global model. Experiments show that the local models do make significant improvements to the global model, although the amount is small.