Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
Automatic information extraction from semi-structured Web pages by pattern discovery
Decision Support Systems - Web retrieval and mining
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
A uniform framework for integration of information from the web
Information Systems - Special issue on web data integration
Building Web Information Extraction Tasks
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Building Web Information Extraction Tasks
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Towards a wrapper-driven ontology-based framework for knowledge extraction
KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
Service-oriented information extraction
Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Hi-index | 0.00 |
Many online information sources are available on the Web. Giving machine access to such sources leads to many interesting applications, such as using web data in mediators or software agents. Up to now most work in the field of information extraction from the web has concentrated on building wrappers, i.e. programs allowing to reformat presentational data in HTML into a more machine comprehensible format. While being an important part of a web information extraction application such wrappers are not sufficient to fully access a source. Indeed, it is necessary to setup an infrastructure allowing to build queries, fetch pages, extract specific links, etc. In this paper we propose a language called WetDL allowing to describe an information extraction task as a network of operators whose execution performs the desired extraction task.