Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Focused Crawls, Tunneling, and Digital Libraries
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Agents, Crawlers, and Web Retrieval
CIA '02 Proceedings of the 6th International Workshop on Cooperative Information Agents VI
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Statistical approach to estimate the quality of web datasets
CIMMACS'05 Proceedings of the 4th WSEAS international conference on Computational intelligence, man-machine systems and cybernetics
Hi-index | 0.00 |
Presently domain specific search engines are becoming popular because they offer greater accuracy, when compared to general purpose search engines. In this study, a method for collecting domain specific documents from the net was developed for the purpose of improving search results. The main thrust of our approach is to use several metrics to estimate the relevance of every automatically discovered document by a crawler regarding a topic of interest. This type of search resulted in two important findings. First, the time required for manual analysis of document content by the crawler was significantly reduced; second, the content quality of selected documents was improved. These results suggest that the rough estimation of precision and recall calculated in this study offer great promise.