Information Extraction from Semi-structured WEB Page Based on DOM Tree and its Application in Scientific Literature Statistical Analysis System

  • Authors:
  • WeiDong Li;Yibing Dong;RuiJiang Wang;HongXia Tian

  • Affiliations:
  • -;-;-;-

  • Venue:
  • SSME '09 Proceedings of the 2009 IITA International Conference on Services Science, Management and Engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

To extract information automatically from semi-structured web pages, this paper puts forward a method named IESS for discovering the record model based on DOM and Maximal Similar Sub Tree, to identify records automatically and correctly when there are some differences in expression models of records that belong to the same type. To test the performance of the method, a scientific literature statistical analysis system is designed. The practice shows that users can quickly understand the distribution of papers in their retrieving field and grasp the importance with the help of the system.