Exploiting Structural Information for Text Classification on the WWW

Authors:
Johannes Fürnkranz
Affiliations:
-
Venue:
IDA '99 Proceedings of the Third International Symposium on Advances in Intelligent Data Analysis
Year:
1999

Citing 5
Cited 5

Pruning Algorithms for Rule Learning

Machine Learning
ParaSite: mining structural information on the Web

Selected papers from the sixth international conference on World Wide Web
Learning to extract symbolic knowledge from the World Wide Web

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Combining Statistical and Relational Methods for Learning in Hypertext Domains

ILP '98 Proceedings of the 8th International Workshop on Inductive Logic Programming
Learning trees and rules with set-valued features

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Statistical Relational Learning for Document Mining

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Automatic Recognition of News Web Pages

PAISI, PACCF and SOCO '08 Proceedings of the IEEE ISI 2008 PAISI, PACCF, and SOCO international workshops on Intelligence and Security Informatics
Classifying documents with link-based bibliometric measures

Information Retrieval
A unified representation of web logs for mining applications

Information Retrieval
Classifying web data in directory structures

APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we report on a set of experiments that explore the utility of making use of the structural information of WWW documents. Our working hypothesis is that it is often easier to classify a hypertext page using information provided on pages that point to it instead of using information that is provided on the page itself. We present experimental evidence that confirms this hypothesis on a set of Web-pages that relate to Computer Science Departments.