Perception-oriented online news extraction

  • Authors:
  • Jinlin Chen;Keli Xiao

  • Affiliations:
  • City University of New York, New York, NY, USA;City University of New York, New York, NY, USA

  • Venue:
  • Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A novel online news extraction approach based on human perception is presented in this paper. The approach simulates how a human perceives and identifies online news content. It first detects news areas based on content function, space continuity, and formatting continuity of news information. It further identifies detailed news content based on the position, format, and semantic of detected news areas. Experiment results show that our approach achieves much better performance (in average more than 99% in terms of F1 Value) compared to previous approaches such as Tree Edit Distance and Visual Wrapper based approaches. Furthermore, our approach does not assume the existence of Web templates in the tested Web pages as required by Tree Edit Distance based approach, nor does it need training sets as required in Visual Wrapper based approach. The success of our approach demonstrates the strength of the perception-oriented Web information extraction methodology and represents a promising approach for automatic information extraction from sources with presentation design for humans.