RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Automatic information extraction from large websites
Journal of the ACM (JACM)
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering
Extracting lists of data records from semi-structured web pages
Data & Knowledge Engineering
FiVaTech: Page-Level Web Data Extraction from Template Pages
IEEE Transactions on Knowledge and Data Engineering
Harvesting relational tables from lists on the web
Proceedings of the VLDB Endowment
TEX: An efficient and effective unsupervised Web information extractor
Knowledge-Based Systems
Hi-index | 0.00 |
We propose a technique that takes two or more web pages generated by the same server-side template and tries to learn a regular expression that represents it and helps extract relevant information from similar pages. Our experimental results on real-world web sites demonstrate that our technique outperforms others in terms of both effectiveness and efficiency and is not affected by HTML errors.