News information extraction based on adaptive weighting using unsupervised Bayesian algorithm

  • Authors:
  • Shilin Huang;Xiaolin Zheng;Xiaowei Wang;Deren Chen

  • Affiliations:
  • College of Computer Science and Technology, Zhejiang University, Hangzhou, China;College of Computer Science and Technology, Zhejiang University, Hangzhou, China;College of Computer Science and Technology, Zhejiang University, Hangzhou, China;College of Computer Science and Technology, Zhejiang University, Hangzhou, China

  • Venue:
  • WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Information extraction is important in web information retrieval. In case of news information extraction, because news information does not have representative keywords pointing out its beginning and ending, it is difficult to specify the news title and body automatically. Our approach is based on an adaptive weighting factor using Bayesian algorithm to solve this problem. We divided a news page into text fragments, and represented them with a set of content features and layout features. We used an adaptive weighting factor to make features fit in different pages. Experiments show that our method results in a higher precision than the original algorithm without a weighting factor on the task of news information extraction.