Web page classification with heterogeneous data fusion

  • Authors:
  • Zenglin Xu;Irwin King;Michael R. Lyu

  • Affiliations:
  • Chinese University of Hong Kong;Chinese University of Hong Kong;Chinese University of Hong Kong

  • Venue:
  • Proceedings of the 16th international conference on World Wide Web
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web pages are more than text and they contain much contextual and structural information, e.g., the title, the meta data, the anchor text,etc., each of which can be seen as a data source or are presentation. Due to the different dimensionality and different representing forms of these heterogeneous data sources, simply putting them together would not greatly enhance the classification performance. We observe that via a kernel function, different dimensions and types of data sources can be represented into acommon format of kernel matrix, which can be seen as a generalized similarity measure between a pair of web pages. In this sense, a kernel learning approach is employed to fuse these heterogeneous data sources. The experimental results on a collection of the ODP database validate the advantages of the proposed method over traditional methods based on any single data source and the uniformly weighted combination of them.