Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
Testbed for information extraction from deep web
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Extracting lists of data records from semi-structured web pages
Data & Knowledge Engineering
Extracting data records from the web using tag path clustering
Proceedings of the 18th international conference on World wide web
FiVaTech: Page-Level Web Data Extraction from Template Pages
IEEE Transactions on Knowledge and Data Engineering
Towards web-scale structured web data extraction
Proceedings of the sixth ACM international conference on Web search and data mining
Hi-index | 0.00 |
We propose a novel approach for extraction of structured web data called ClustVX. It clusters visually similar web page elements by exploiting their visual formatting and structural features. Clusters are then used to derive extraction rules. The experimental evaluation results of ClustVX system on three publicly available benchmark data sets outperform state-of-the-art structured data extraction systems.