Recognition of Data Records in Semi-structured Web-Pages Using Ontology and Χ2 Statistical Distribution

  • Authors:
  • Amin Keshavarzi;Amir Masoud Rahmani;Mehran Mohsenzadeh;Reza Keshavarzi

  • Affiliations:
  • Islamic azad univ.Marvdasht branch, Iran;Islamic azad univ.Science and research branch, Tehran, Iran;Islamic azad univ.Science and research branch, Tehran, Iran;University of isfahan, Iran

  • Venue:
  • ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Information extraction (IE) has been emerged as a noveldiscipline in computer science. In IE, intelligent algorithms areemployed to extract the required data, and structure them so thatthey are appropriate for query. In most IE systems, a web-pagestructure, e.g. HTML tags are used to recognize the looked-forinformation. In this article, an algorithm is developed torecognize the main region of web-pages containing the looked-forinformation, by means of an ontology, a web-page structure andgoodness-of-fit Χ2 test. After recognizingthe main region, the existing records of the region are recognized,and then each record is put in a text file.