Hit count reliability: how much can we trust hit counts?

Authors:
Koh Satoh;Hayato Yamana
Affiliations:
Fundamental Science and Engineering, Waseda University, Tokyo, Japan;Fundamental Science and Engineering, Waseda University, Tokyo, Japan
Venue:
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Year:
2012

Citing 12
Cited 0

Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
Towards the self-annotating web

Proceedings of the 13th international conference on World Wide Web
POLYPHONET: an advanced social network extraction system from the web

Proceedings of the 15th international conference on World Wide Web
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Extracting accurate and complete results from search engines: Case study windows live

Journal of the American Society for Information Science and Technology
ResIn: a combination of results caching and index pruning for high-performance web search engines
Quantitative comparisons of search engine results

Journal of the American Society for Information Science and Technology
Using Semantic Distances for Reasoning with Inconsistent Ontologies

ISWC '08 Proceedings of the 7th International Conference on The Semantic Web
Investigation of the accuracy of search engine hit counts

Journal of Information Science
Automatic keyword prediction using Google similarity distance

Expert Systems with Applications: An International Journal
Reliability verification of search engines' hit counts: how to select a reliable hit count for a query

ICWE'10 Proceedings of the 10th international conference on Current trends in web engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, there have been numerous studies that rely on the number of search results, i.e., hit count. However, hit counts returned by search engines can vary unnaturally when observed on different days, and may contain large errors that affect researches that depend on those results. Such errors can result in low precision of machine translation, incorrect extraction of synonyms and other problems. Thus, it is indispensable to evaluate and to improve the reliability of hit counts. There exist several researches to show the phenomenon; however, none of previous researches have made clear how much we can trust them. In this paper, we propose hit counts' reliability metrics to quantitatively evaluate hit counts' reliability to improve hit count selection. The evaluation results with Google show that our metrics successfully adopt reliable hit counts --- 99.8% precision, and skip to adopt unreliable hit counts --- 74.3% precision.