Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Adaptive information extraction
ACM Computing Surveys (CSUR)
Documentum ECI self-repairing wrappers: performance analysis
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
Extracting lists of data records from semi-structured web pages
Data & Knowledge Engineering
FiVaTech: Page-Level Web Data Extraction from Template Pages
IEEE Transactions on Knowledge and Data Engineering
Harvesting relational tables from lists on the web
Proceedings of the VLDB Endowment
TEX: An efficient and effective unsupervised Web information extractor
Knowledge-Based Systems
Hi-index | 0.00 |
The literature provides a variety of techniques to build the information extractors on which some data integration systems rely. Information extraction techniques are usually based on extraction rules that require maintenance and adaptation if web sources change. We present our preliminary steps towards an unsupervised information extraction technique that searches web documents for shared patterns and fragments them until finding the relevant information that should be extracted. Experimental results on 1230 real-web documents demonstrate that our system performs fast and achieves promising results.