FiVaTech: Page-Level Web Data Extraction from Template Pages

Authors:
Mohammed Kayed;Chia-Hui Chang
Affiliations:
Beni-Suef Universiy, Giza;National Central University, Chung-Li
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2010

Citing 0
Cited 15

Little knowledge rules the web: domain-centric result page extraction

RR'11 Proceedings of the 5th international conference on Web reasoning and rule systems
AMBER: turning annotations into knowledge

Proceedings of the 21st international conference companion on World Wide Web
Peer matrix alignment: a new algorithm

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Towards a method for unsupervised web information extraction

ICWE'12 Proceedings of the 12th international conference on Web Engineering
Clustering visually similar web page elements for structured web data extraction

ICWE'12 Proceedings of the 12th international conference on Web Engineering
TEX: An efficient and effective unsupervised Web information extractor

Knowledge-Based Systems
DEQA: deep web extraction for question answering

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part II
An unsupervised technique to extract information from semi-structured web pages

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Towards discovering ontological models from big RDF data

ER'12 Proceedings of the 2012 international conference on Advances in Conceptual Modeling
Towards discovering conceptual models behind web sites

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
A framework for populating ontological models from semi-structured web documents

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Towards web-scale structured web data extraction

Proceedings of the sixth ACM international conference on Web search and data mining
Robust detection of semi-structured web records using a DOM structure-knowledge-driven model

ACM Transactions on the Web (TWEB)
Architecture specification of rule-based deep web crawler with indexer

International Journal of Knowledge and Web Intelligence
Effects of Terms Recognition Mistakes on Requests Processing for Interactive Information Retrieval

International Journal of Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web data extraction has been an important part for many Web data analysis applications. In this paper, we formulate the data extraction problem as the decoding process of page generation based on structured data and tree templates. We propose an unsupervised, page-level data extraction approach to deduce the schema and templates for each individual Deep Website, which contains either singleton or multiple data records in one Webpage. FiVaTech applies tree matching, tree alignment, and mining techniques to achieve the challenging task. In experiments, FiVaTech has much higher precision than EXALG and is comparable with other record-level extraction systems like ViPER and MSE. The experiments show an encouraging result for the test pages used in many state-of-the-art Web data extraction works.