A technique for measuring the relative size and overlap of public Web search engines
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Introduction to the special issue on the web as corpus
Computational Linguistics - Special issue on web as corpus
Towards the self-annotating web
Proceedings of the 13th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Ensemble methods for automatic thesaurus extraction
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Random sampling from a search engine's index
Proceedings of the 15th international conference on World Wide Web
POLYPHONET: an advanced social network extraction system from the web
Proceedings of the 15th international conference on World Wide Web
Measuring semantic similarity between words using web search engines
Proceedings of the 16th international conference on World Wide Web
Introduction to Information Retrieval
Introduction to Information Retrieval
Disambiguating Personal Names on the Web using Automatically Extracted Key Phrases
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Measuring semantic similarity by latent relational analysis
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Flink: Semantic Web technology for the extraction and analysis of social networks
Web Semantics: Science, Services and Agents on the World Wide Web
Ontologies are us: a unified model of social networks and semantics
ISWC'05 Proceedings of the 4th international conference on The Semantic Web
A method for learning part-whole relations
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
POLYPHONET: An advanced social network extraction system from the Web
Web Semantics: Science, Services and Agents on the World Wide Web
Mining recommendations from the web
Proceedings of the 2008 ACM conference on Recommender systems
A "quick and dirty" website data quality indicator
Proceedings of the 2nd ACM workshop on Information credibility on the web
TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
Improving relational similarity measurement using symmetries in proportional word analogies
Information Processing and Management: an International Journal
Hi-index | 0.00 |
Various studies within NLP and Semantic Web use the so-called Google count, which is the hit count on a query returned by a search engine (not only Google). However, sometimes the Google count is unreliable, especially when the count is large, or when advanced operators such as OR and NOT are used. In this paper, we propose a novel algorithm that estimates the Google count robustly. It (i) uses the co-occurrence of terms as evidence to estimate the occurrence of a given word, and (ii) integrates multiple evidence for robust estimation. We evaluated our algorithm for more than 2000 queries on three datasets using Google, Yahoo! and MSN search engine. Our algorithm also provides estimate counts for any classifier that judges a web page as positive or negative. Consequently, we can estimate the number of documents with included references of a particular person (among namesakes) on the entire web.