Link-Local features for hypertext classification

Authors:
Hervé Utard;Johannes Fürnkranz
Affiliations:
Knowledge Engineering Group, TU Darmstadt, Darmstadt, Germany;Knowledge Engineering Group, TU Darmstadt, Darmstadt, Germany
Venue:
EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining
Year:
2005

Citing 6
Cited 7

Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Learning to construct knowledge bases from the World Wide Web

Artificial Intelligence - Special issue on Intelligent internet systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems

Classifiers without borders: incorporating fielded text from neighboring web pages

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Web page classification: Features and algorithms

ACM Computing Surveys (CSUR)
Hypertext Classification Using Tensor Space Model and Rough Set Based Ensemble Classifier

PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Tensor Framework and Combined Symmetry for Hypertext Mining

Fundamenta Informaticae
A novel split and merge technique for hypertext classification

Transactions on rough sets XII
Introducing semantics in web personalization: the role of ontologies

EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining
Tensor Framework and Combined Symmetry for Hypertext Mining

Fundamenta Informaticae

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous work in hypertext classification has resulted in two principal approaches for incorporating information about the graph properties of the Web into the training of a classifier. The first approach uses the complete text of the neighboring pages, whereas the second approach uses only their class labels. In this paper, we argue that both approaches are unsatisfactory: the first one brings in too much irrelevant information, while the second approach is too coarse by abstracting the entire page into a single class label. We argue that one needs to focus on relevant parts of predecessor pages, namely on the region in the neighborhood of the origin of an incoming link. To this end, we will investigate different ways for extracting such features, and compare several different techniques for using them in a text classifier.