Web page classification with heterogeneous data fusion

Authors:
Zenglin Xu;Irwin King;Michael R. Lyu
Affiliations:
Chinese University of Hong Kong;Chinese University of Hong Kong;Chinese University of Hong Kong
Venue:
Proceedings of the 16th international conference on World Wide Web
Year:
2007

Citing 6
Cited 9

Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Web classification using support vector machine

Proceedings of the 4th international workshop on Web information and data management
Hypertext Categorization using Hyperlink Patterns and Meta Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Learning the Kernel Matrix with Semidefinite Programming

The Journal of Machine Learning Research
A comparison of implicit and explicit links for web page classification

Proceedings of the 15th international conference on World Wide Web

Web page classification: Features and algorithms

ACM Computing Surveys (CSUR)
Automatic Web Page Classification Using Various Features

PCM '08 Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Hypertext Classification Using Tensor Space Model and Rough Set Based Ensemble Classifier

PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Tensor Framework and Combined Symmetry for Hypertext Mining

Fundamenta Informaticae
A tool for web usage mining

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Multi-network fusion for collective inference

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
A novel split and merge technique for hypertext classification

Transactions on rough sets XII
Tensor Framework and Combined Symmetry for Hypertext Mining

Fundamenta Informaticae
Multi-source learning with block-wise missing data for Alzheimer's disease prediction

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web pages are more than text and they contain much contextual and structural information, e.g., the title, the meta data, the anchor text,etc., each of which can be seen as a data source or are presentation. Due to the different dimensionality and different representing forms of these heterogeneous data sources, simply putting them together would not greatly enhance the classification performance. We observe that via a kernel function, different dimensions and types of data sources can be represented into acommon format of kernel matrix, which can be seen as a generalized similarity measure between a pair of web pages. In this sense, a kernel learning approach is employed to fuse these heterogeneous data sources. The experimental results on a collection of the ODP database validate the advantages of the proposed method over traditional methods based on any single data source and the uniformly weighted combination of them.