Vector-space ranking with effective early termination
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Static index pruning for information retrieval systems
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to the special issue on the web as corpus
Computational Linguistics - Special issue on web as corpus
Computational Linguistics
The Google Similarity Distance
IEEE Transactions on Knowledge and Data Engineering
Introduction to Information Retrieval
Introduction to Information Retrieval
Quantitative comparisons of search engine results
Journal of the American Society for Information Science and Technology
Investigation of the accuracy of search engine hit counts
Journal of Information Science
Graph-based word clustering using a web search engine
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
A prediction model for web search hit counts using word frequencies
Journal of Information Science
Hit count reliability: how much can we trust hit counts?
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Hi-index | 0.00 |
In this paper, we investigate the trustworthiness of search engines' hit counts, numbers returned as search result counts. Since many studies adopt search engines' hit counts to estimate the popularity of input queries, the reliability of hit counts is indispensable for archiving trustworthy studies. However, hit counts are unreliable because they change, when a user clicks the "Search" button more than once or clicks the "Next" button on the search results page, or when a user queries the same term on separate days. In this paper, we analyze the characteristics of hit count transition by gathering various types of hit counts over two months by using 10,000 queries. The results of our study show that the hit counts with the largest search offset just before search engines adjust their hit counts are the most reliable. Moreover, hit counts are the most reliable when they are consistent over approximately a week.