A prediction model for web search hit counts using word frequencies

Authors:
Tian Tian;Soon Ae Chun;James Geller
Affiliations:
New Jersey Institute of Technology, USA;City University of New York, USA;New Jersey Institute of Technology, USA
Venue:
Journal of Information Science
Year:
2011

Citing 27
Cited 0

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A general language model for information retrieval (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Digital Data Structures and Order Statistics

WADS '89 Proceedings of the Workshop on Algorithms and Data Structures
A Statistical Corpus-Based Term Extractor

AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Using the web to obtain frequencies for unseen bigrams

Computational Linguistics - Special issue on web as corpus
Excalibur: A Personalized Meta Search Engine

COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts - Volume 02
Exploiting query features in language modeling approach for information retrieval

Exploiting query features in language modeling approach for information retrieval
Characteristics of scientific web publications: preliminary data gathering and analysis

Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Random sampling from a search engine's index

Proceedings of the 15th international conference on World Wide Web
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Using Google distance to weight approximate ontology matches

Proceedings of the 16th international conference on World Wide Web
Semantic deep web: automatic attribute extraction from the deep web data sources

Proceedings of the 2007 ACM symposium on Applied computing
Googleology is Bad Science

Computational Linguistics
Search Engines that Learn from Implicit Feedback

Computer
Extracting accurate and complete results from search engines: Case study windows live

Journal of the American Society for Information Science and Technology
Enriching Ontology for Deep Web Search

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Assessment for Ontology-Supported Deep Web Search

CECANDEEE '08 Proceedings of the 2008 10th IEEE Conference on E-Commerce Technology and the Fifth IEEE Conference on Enterprise Computing, E-Commerce and E-Services
Investigation of the accuracy of search engine hit counts

Journal of Information Science
Introduction to Webometrics: Quantitative Web Research for the Social Sciences

Introduction to Webometrics: Quantitative Web Research for the Social Sciences
Tools-at-hand and learning in multi-session, collaborative search

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Semantic tags generation and retrieval for online advertising

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Ranking the linked data: the case of DBpedia

ICWE'10 Proceedings of the 10th international conference on Web engineering
Predicting Web Search Hit Counts

WI-IAT '10 Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Reliability verification of search engines' hit counts: how to select a reliable hit count for a query

ICWE'10 Proceedings of the 10th international conference on Current trends in web engineering
Improving web search results for homonyms by suggesting completions from an ontology

ICWE'10 Proceedings of the 10th international conference on Current trends in web engineering
Ontology-Based spatial query expansion in information retrieval

OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, COA, and ODBASE - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

A search engine user with a well-defined information need is not interested in getting thousands of hits, but a few hits that are all highly relevant to their search. Often search words need to be refined and augmented to narrow results to more relevant pages. However, an overly specific query may lead to no hits at all, while most typical queries lead to thousands or even millions of them, both undesirable outcomes. This paper suggests a query rewriting method for generating alternative query strings and proposes a hit count prediction model for predicting the number of search engine hits for each alternative query string, based on the English language frequencies of the words in the search terms. Using the hit count prediction model, different types of search strategies, such as a lowest hit count query preference, can be utilized to improve users' search experience. We present an evaluation experiment of the hit count prediction model for three major search engines. We also discuss and quantify how far the Google, Yahoo! and Bing search engines diverge from monotonic behaviour, considering negative and positive search terms separately.