Statistical approach to estimate the quality of web datasets

Authors:
Vitaly Klyuev
Affiliations:
Software Engineering Laboratory, University of Aizu, Aizu-Wakamatsu City, Fukushima, Japan
Venue:
CIMMACS'05 Proceedings of the 4th WSEAS international conference on Computational intelligence, man-machine systems and cybernetics
Year:
2005

Citing 6
Cited 0

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Compiling document collections from the Internet

ACM SIGIR Forum
Building domain-specific web collections for scientific digital libraries: a meta-search enhanced focused crawling method

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
A machine learning approach to building domain-specific search engines

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Keyword spices: a new method for building domain-specific web search engines

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding appropriate information on the Web is getting more difficult with inefficient tools currently being used on the net. Using a topicspecific approach to build crawlers is promising. In this paper, we discuss a technique using methods of statistical analysis to evaluate the quality of the crawled documents. We have found this technique is more robust, more reliable, more practical and less subjective compared to others.