Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
A versatile model for web page representation, information extraction and content re-packaging
Proceedings of the 11th ACM symposium on Document engineering
Information extraction from web pages based on their visual representation
ICWE'11 Proceedings of the 11th international conference on Current Trends in Web Engineering
Hi-index | 0.00 |
In this paper, we present WPPS, a new configurable Java-based framework for developing web page processing methods. The key innovations of WPPS are 1) a unified ontological model which describes the visual representation of web pages; 2) an API and abstractions which allow the application of both declarative and object-oriented mechanisms to develop new methods and approaches.