A Layout-Independent Web News Article Contents Extraction Method Based on Relevance Analysis

  • Authors:
  • Hao Han;Takehiro Tokuda

  • Affiliations:
  • Department of Computer Science, Tokyo Institute of Technology Meguro, Tokyo, Japan 152-8552;Department of Computer Science, Tokyo Institute of Technology Meguro, Tokyo, Japan 152-8552

  • Venue:
  • ICWE '9 Proceedings of the 9th International Conference on Web Engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The traditional Web news article contents extraction methods are time-costly and need much maintenance because they analyze the layout of news pages to generate the wrappers manually or automatically. In this paper, we propose a relevance-based analysis method to extract the news article contents from the news pages without the analysis of news page layouts before extraction. This method is applicable to the general news pages and we give the implementations of news extraction from different kinds of news sources.