Towards a method for unsupervised web information extraction

Authors:
Hassan A. Sleiman;Rafael Corchuelo
Affiliations:
ETSI Informática, Universidad de Sevilla, Sevilla, Spain;ETSI Informática, Universidad de Sevilla, Sevilla, Spain
Venue:
ICWE'12 Proceedings of the 12th international conference on Web Engineering
Year:
2012

Citing 8
Cited 1

Generating finite-state transducers for semi-structured data extraction from the Web

Information Systems - Special issue on semistructured data
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
Adaptive information extraction

ACM Computing Surveys (CSUR)
Documentum ECI self-repairing wrappers: performance analysis

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A Survey of Web Information Extraction Systems

IEEE Transactions on Knowledge and Data Engineering
Extracting lists of data records from semi-structured web pages

Data & Knowledge Engineering
FiVaTech: Page-Level Web Data Extraction from Template Pages

IEEE Transactions on Knowledge and Data Engineering
Harvesting relational tables from lists on the web

Proceedings of the VLDB Endowment

TEX: An efficient and effective unsupervised Web information extractor

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The literature provides a variety of techniques to build the information extractors on which some data integration systems rely. Information extraction techniques are usually based on extraction rules that require maintenance and adaptation if web sources change. We present our preliminary steps towards an unsupervised information extraction technique that searches web documents for shared patterns and fragments them until finding the relevant information that should be extracted. Experimental results on 1230 real-web documents demonstrate that our system performs fast and achieves promising results.