Analysis of a very large web search engine query log
ACM SIGIR Forum
Comparing web logs: sensitivity analysis and two types of cross-analysis
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Survey and evaluation of query intent detection methods
Proceedings of the 2009 workshop on Web Search Click Data
Proceedings of the 2009 workshop on Web Search Click Data
A survey on session detection methods in query logs and a proposal for future evaluation
Information Sciences: an International Journal
Data Mining and Knowledge Discovery
Hi-index | 0.01 |
The results of the Web query log analysis may be significantly shifted depending on the fraction of agents (non-human clients), which are not excluded from the log. To detect and exclude agents the Web log studies use threshold values for a number of requests submitted by a client during the observation period. However, different studies use different observation periods, and a threshold assigned to one period is usually incomparable with the threshold assigned to the other period. We propose the uniform method equally working on the different observation periods. The method bases on the sliding window technique: a threshold is assigned to the sliding window rather than to the whole observation period. Besides, we determine the sub-optimal values of the parameters of the method: a window size and a threshold and recommend 5-7 unique queries as an upper bound of the threshold for 1-hour sliding window.