For a few dollars less: identifying review pages sans human labels
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
SUT: Quantifying and mitigating URL typosquatting
Computer Networks: The International Journal of Computer and Telecommunications Networking
Hi-index | 0.00 |
How to classify automatically web information that grows explosive is becoming an imminent problem needed to be resolved. Based on site structure, we propose, in this paper, a new mechanism of automatic classification of web information, which downloads web pages within a web site, records the hyperlinks among web pages, catches the site structure, extracts the classifying system of the site itself, and then links categorizing information with the correspondent position in the site structure. Therefore automatic classification of web information can be realized through matching the positions of categorizing information with the positions of web pages. Experiments show that such classification based on site structure works more accurately and efficiently.