Automatic Classification of Web Information Based on Site Structure

Authors:
Gao Kening;Yang Leiming;Zhang Bin;Chai Qiaozi;Ma Anxiang
Affiliations:
Northeastern University , Shenyang, China;Northeastern University , Shenyang, China;Northeastern University , Shenyang, China;Northeastern University , Shenyang, China;Northeastern University , Shenyang, China
Venue:
CW '05 Proceedings of the 2005 International Conference on Cyberworlds
Year:
2005

Citing 0
Cited 2

For a few dollars less: identifying review pages sans human labels

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
SUT: Quantifying and mitigating URL typosquatting

Computer Networks: The International Journal of Computer and Telecommunications Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

How to classify automatically web information that grows explosive is becoming an imminent problem needed to be resolved. Based on site structure, we propose, in this paper, a new mechanism of automatic classification of web information, which downloads web pages within a web site, records the hyperlinks among web pages, catches the site structure, extracts the classifying system of the site itself, and then links categorizing information with the correspondent position in the site structure. Therefore automatic classification of web information can be realized through matching the positions of categorizing information with the positions of web pages. Experiments show that such classification based on site structure works more accurately and efficiently.