Compiling document collections from the Internet

Authors:
V. Kluev
Affiliations:
The Core and Information Technology Center, The University of Aizu, Tsuruga, Ikki-machi Aizu-Wakamatsu City, Fukushima, 965-8580, Japan
Venue:
ACM SIGIR Forum
Year:
2000

Citing 0
Cited 5

Collection synthesis

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Focused Crawls, Tunneling, and Digital Libraries

ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Agents, Crawlers, and Web Retrieval

CIA '02 Proceedings of the 6th International Workshop on Cooperative Information Agents VI
Building domain-specific web collections for scientific digital libraries: a meta-search enhanced focused crawling method

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Statistical approach to estimate the quality of web datasets

CIMMACS'05 Proceedings of the 4th WSEAS international conference on Computational intelligence, man-machine systems and cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Presently domain specific search engines are becoming popular because they offer greater accuracy, when compared to general purpose search engines. In this study, a method for collecting domain specific documents from the net was developed for the purpose of improving search results. The main thrust of our approach is to use several metrics to estimate the relevance of every automatically discovered document by a crawler regarding a topic of interest. This type of search resulted in two important findings. First, the time required for manual analysis of document content by the crawler was significantly reduced; second, the content quality of selected documents was improved. These results suggest that the rough estimation of precision and recall calculated in this study offer great promise.