Little knowledge rules the web: domain-centric result page extraction
RR'11 Proceedings of the 5th international conference on Web reasoning and rule systems
AMBER: turning annotations into knowledge
Proceedings of the 21st international conference companion on World Wide Web
Peer matrix alignment: a new algorithm
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Towards a method for unsupervised web information extraction
ICWE'12 Proceedings of the 12th international conference on Web Engineering
Clustering visually similar web page elements for structured web data extraction
ICWE'12 Proceedings of the 12th international conference on Web Engineering
TEX: An efficient and effective unsupervised Web information extractor
Knowledge-Based Systems
DEQA: deep web extraction for question answering
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part II
An unsupervised technique to extract information from semi-structured web pages
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Towards discovering ontological models from big RDF data
ER'12 Proceedings of the 2012 international conference on Advances in Conceptual Modeling
Towards discovering conceptual models behind web sites
ER'12 Proceedings of the 31st international conference on Conceptual Modeling
A framework for populating ontological models from semi-structured web documents
ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Towards web-scale structured web data extraction
Proceedings of the sixth ACM international conference on Web search and data mining
Robust detection of semi-structured web records using a DOM structure-knowledge-driven model
ACM Transactions on the Web (TWEB)
Architecture specification of rule-based deep web crawler with indexer
International Journal of Knowledge and Web Intelligence
Effects of Terms Recognition Mistakes on Requests Processing for Interactive Information Retrieval
International Journal of Information Retrieval Research
Hi-index | 0.00 |
Web data extraction has been an important part for many Web data analysis applications. In this paper, we formulate the data extraction problem as the decoding process of page generation based on structured data and tree templates. We propose an unsupervised, page-level data extraction approach to deduce the schema and templates for each individual Deep Website, which contains either singleton or multiple data records in one Webpage. FiVaTech applies tree matching, tree alignment, and mining techniques to achieve the challenging task. In experiments, FiVaTech has much higher precision than EXALG and is comparable with other record-level extraction systems like ViPER and MSE. The experiments show an encouraging result for the test pages used in many state-of-the-art Web data extraction works.