Adopting Wildlife Experiments for Web Evolution Estimations: The Role of an AI Web Page Classifier

Authors:
Ioannis Anagnostopoulos;Photis Stavropoulos
Affiliations:
University of the Aegean, Greece;University of the Aegean, Greece
Venue:
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Year:
2006

Citing 0
Cited 2

Monitoring the evolution of cached content in Google and MSN

Proceedings of the 16th international conference on World Wide Web
Ranking bias in deep web size estimation using capture recapture method

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a statistical approach for estimating the evolution of web pages in directories. The proposal is based on the capture-recapture method used in wildlife biological studies in an animal, bird or fish populations, and it is modified according to the necessary assumptions and amendments for applying the experiments in a search engine directory. During these experiments, web pages are considered as animals and the specific types of web pages as particular species of animals whose abundance, birth, death and survival rates are estimated. The population is open, meaning that new web pages are submitted to the search engine directory, while others are removed from the directory indexes, resembling to emigration/immigration processes in nature. The role of the biologist who recognizes the species under study and records their history is assigned to a web page classifier, which is trained under the Open Directory's (DMOZ project) taxonomy. The classifier is a three layer Probabilistic Neural Network capable of identifying and categorizing web pages, on the basis of information filtering. A virtual experiment is simulated based on the classifier performance over real web pages, while the results are quite promising.