Automatic modeling of file system workloads using two-level arrival processes
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Analysis of a very large web search engine query log
ACM SIGIR Forum
A methodology for workload characterization of E-commerce sites
Proceedings of the 1st ACM conference on Electronic commerce
Characterizing Web user sessions
ACM SIGMETRICS Performance Evaluation Review
Session-Based Admission Control: A Mechanism for Peak Load Management of Commercial Web Sites
IEEE Transactions on Computers
Combining evidence for automatic web session identification
Information Processing and Management: an International Journal - Issues of context in information retrieval
A hierarchical and multiscale approach to analyze E-business workloads
Performance Evaluation
What is worth learning from parallel workloads?: a user and session based analysis
Proceedings of the 19th annual international conference on Supercomputing
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Defining a session on Web search engines: Research Articles
Journal of the American Society for Information Science and Technology
Open versus closed: a cautionary tale
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Locality of sampling and diversity in parallel system workloads
Proceedings of the 21st annual international conference on Supercomputing
Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs
Proceedings of the 17th ACM conference on Information and knowledge management
Uncovering the Effect of System Performance on User Behavior from Traces of Parallel Systems
MASCOTS '07 Proceedings of the 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
A survey on session detection methods in query logs and a proposal for future evaluation
Information Sciences: an International Journal
Stratified analysis of AOL query log
Information Sciences: an International Journal
Empirical observations on the session timeout threshold
Information Processing and Management: an International Journal
On Simulation and Design of Parallel-Systems Schedulers: Are We Doing the Right Thing?
IEEE Transactions on Parallel and Distributed Systems
Models of searching and browsing: languages, studies, and applications
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Do you want to take notes?: identifying research missions in Yahoo! search pad
Proceedings of the 19th international conference on World wide web
Intent boundary detection in search query logs
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Online multitasking and user engagement
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Activity logs from large-scale systems facilitate the study of user behavior, which can be used to improve and tune the user experience. However, the available data often lacks important elements such as the identification of user sessions. Previous work typically compensated for this by setting a threshold of around 30 minutes, and assuming that breaks in activity longer than the threshold reflect breaks between sessions. We show that using such a global threshold introduces artifacts that may affect the analysis, because there is a high probability that long sessions are not identified correctly. As an alternative, we suggest that a suitable individual threshold be found for each user, based on that user's activity pattern. Applying this approach to a large dataset from the AOL search engine leads to a distribution of session durations that is free of artifacts like those that appear when using a global threshold.