A Voting Method for the Classification of Web Pages

Authors:
Rui Fang;Alexander Mikroyannidis;Babis Theodoulidis
Affiliations:
University of Manchester, United Kingdom;University of Manchester, United Kingdom;University of Manchester, United Kingdom
Venue:
WI-IATW '06 Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology
Year:
2006

Citing 12
Cited 1

Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
A practical hypertext catergorization method using links and incrementally available class information

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Using web structure for classifying and describing web pages

Proceedings of the 11th international conference on World Wide Web
Web classification using support vector machine

Proceedings of the 4th international workshop on Web information and data management
Learning Logical Definitions from Relations

Machine Learning
Composite Kernels for Hypertext Categorisation

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Discovering Test Set Regularities in Relational Domains

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Web-page classification through summarization

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A Theoretical Framework and an Implementation Architecture for Self Adaptive Web Sites

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
When are links useful? experiments in text classification

ECIR'03 Proceedings of the 25th European conference on IR research

Extraction and classification of dense implicit communities in the Web graph

ACM Transactions on the Web (TWEB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper discusses web page classification using hypertext features such as the text included in the web page, the title, headings, URL, and anchor text. Five different classification approaches based on SVM that use individual features or combinations are investigated on the LookSmart dataset. The initial experimental results have shown that combining the features improves the performance of the classifier and that some features such as title and headings can be very useful for certain tasks. On the basis of this analysis, we propose a voting method that further improves the performance compared with the individual classifiers.