Query session detection as a cascade

Authors:
Matthias Hagen;Benno Stein;Tino Rüb
Affiliations:
Bauhaus-Universität Weimar, Weimar, Germany;Bauhaus-Universität Weimar, Weimar, Germany;Bauhaus-Universität Weimar, Weimar, Germany
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 17
Cited 2

Multitasking information seeking and searching processes

Journal of the American Society for Information Science and Technology
A practical web-based approach to generating topic hierarchy for text segments

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Query chains: learning to rank from implicit feedback

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Implicit user modeling for personalized search

Proceedings of the 14th ACM international conference on Information and knowledge management
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
A picture of search

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Defining a session on Web search engines: Research Articles

Journal of the American Society for Information Science and Technology
Query Modifications Patterns During Web Searching

ITNG '07 Proceedings of the International Conference on Information Technology
Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs

Proceedings of the 17th ACM conference on Information and knowledge management
A survey on session detection methods in query logs and a proposal for future evaluation

Information Sciences: an International Journal
The ESA retrieval model revisited

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Models of searching and browsing: languages, studies, and applications

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Analyzing and evaluating query reformulation strategies in web search logs

Proceedings of the 18th ACM conference on Information and knowledge management
Multitasking during Web search sessions

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Similarity measures for short segments of text

ECIR'07 Proceedings of the 29th European conference on IR research
Identifying task-based sessions in search engine query logs

Proceedings of the fourth ACM international conference on Web search and data mining

From search session detection to search mission detection

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Learning to detect task boundaries of query session

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.01

Visualization

Abstract

We propose a cascading method for query session detection, the problem of identifying series of consecutive queries a user submits with the same information need. While the existing session detection research mostly deals with effectiveness, our focus also is on efficiency, and we investigate questions related to the analysis trade-off: How expensive (in terms of runtime) is a certain improvement in F-Measure? In this regard, we distinguish two major scenarios where query session knowledge is important: (1) In an online setting, the search engine tries to incorporate knowledge of the preceding queries for an improved retrieval performance. Obviously, the efficiency of the session detection method is a crucial issue as the overall retrieval time should not be influenced too much. (2) In an offline post-retrieval setting, search engine logs are divided into sessions in order to examine what causes users to fail or to identify typical reformulation patterns etc. Here, efficiency might not be as important as in the online scenario but the accuracy of the detected sessions is essential. Our cascading method provides a sensible treatment for both scenarios. It involves different steps that form a cascade in the sense that computationally costly and hence time-consuming features are applied only after cheap features "failed." This is different to previous session detection methods, most of which involve many features simultaneously. Experiments on a standard test corpus show the cascading method to save runtime compared to the state of the art while the detected sessions' accuracy is even superior.