Using content-based and link-based analysis in building vertical search engines

  • Authors:
  • Michael Chau;Hsinchun Chen

  • Affiliations:
  • School of Business, The University of Hong Kong, Pokfulam, Hong Kong;Department of Management Information Systems, The University of Arizona, Tucson, Arizona

  • Venue:
  • ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper reports our research in the Web page filtering process in specialized search engine development. We propose a machine-learning-based approach that combines Web content analysis and Web structure analysis. Instead of a bag of words, each Web page is represented by a set of content-based and link-based features, which can be used as the input for various machine learning algorithms. The proposed approach was implemented using both a feedforward/backpropagation neural network and a support vector machine. An evaluation study was conducted and showed that the proposed approaches performed better than the benchmark approaches.