Recognition of Data Records in Semi-structured Web-Pages Using Ontology and Χ2 Statistical Distribution

Authors:
Amin Keshavarzi;Amir Masoud Rahmani;Mehran Mohsenzadeh;Reza Keshavarzi
Affiliations:
Islamic azad univ.Marvdasht branch, Iran;Islamic azad univ.Science and research branch, Tehran, Iran;Islamic azad univ.Science and research branch, Tehran, Iran;University of isfahan, Iran
Venue:
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Year:
2008

Citing 7
Cited 0

The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management

The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management
Odaies: Ontology-driven Adaptive Web Information Extraction System

IAT '03 Proceedings of the IEEE/WIC International Conference on Intelligent Agent Technology
Mining data records in Web pages

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
STAVIES: A System for Information Extraction from Unknown Web Data Sources through Automatic Web Wrapper Generation Using Clustering Techniques

IEEE Transactions on Knowledge and Data Engineering
Learning Object Models from Semistructured Web Documents

IEEE Transactions on Knowledge and Data Engineering
A Survey of Web Information Extraction Systems

IEEE Transactions on Knowledge and Data Engineering
NET – a system for extracting web data from flat and nested data records

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information extraction (IE) has been emerged as a noveldiscipline in computer science. In IE, intelligent algorithms areemployed to extract the required data, and structure them so thatthey are appropriate for query. In most IE systems, a web-pagestructure, e.g. HTML tags are used to recognize the looked-forinformation. In this article, an algorithm is developed torecognize the main region of web-pages containing the looked-forinformation, by means of an ontology, a web-page structure andgoodness-of-fit Χ2 test. After recognizingthe main region, the existing records of the region are recognized,and then each record is put in a text file.