Looking at the Web through XML Glasses

Authors:
Arnaud Sahuguet;Fabien Azavant
Affiliations:
-;-
Venue:
COOPIS '99 Proceedings of the Fourth IECIS International Conference on Cooperative Information Systems
Year:
1999

Citing 0
Cited 11

Babel: An XML-Based Application Integration Framework

CAiSE '02 Proceedings of the 14th International Conference on Advanced Information Systems Engineering
Task-Structure Based Mediation: The Travel-Planning Assistant Example

AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
A Heuristic Approach for Converting HTML Documents to XML Documents

CL '00 Proceedings of the First International Conference on Computational Logic
A Unified Framework for Wrapping, Mediating and Restructuring Information from the Web

ER '99 Proceedings of the Workshops on Evolution and Change in Data Management, Reverse Engineering in Information Systems, and the World Wide Web and Conceptual Modeling
A uniform framework for integration of information from the web

Information Systems - Special issue on web data integration
Intelligent knowledge extraction from the web

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems - Intelligent information systems
Querying relational databases through XSLT

Data & Knowledge Engineering
Information extraction from structured documents using k-testable tree automaton inference

Data & Knowledge Engineering
Personalized recommendation of related content based on automatic metadata extraction

CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
Web wrapper validation

APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
PIES: a web information extraction system using ontology and tag patterns

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Web so far has been incredibly successful at delivering information to human users. So successful actually, that there is now an urgent need to go beyond a browsing human and make information accessible to applications, in order to offer automation, inter-operation and Web-awareness among services.To do so, information from Web sources needs to be accessible in a structured way. XML and its various extensions (data-models, query languages) are a step in this direction. Unfortunately, the Web is not yet a well organized repository of nicely structured XML documents but rather a conglomerate of volatile HTML pages, for which structure has to be extracted.To address this problem, we present the World Wide Web Wrapper Factory (W4F), a Java toolkit for the generation of wrappers for Web sources. Our main contributions are: (1) an expressive language to specify the extraction of complex structures from HTML pages; (2) a declarative mapping to XML documents, with the automatic generation of the corresponding DTDs; (3) some visual supports to make the engineering of wrappers faster and easier. As an illustration, we show how we can, via W4F inter-mediation, transparently query HTML sources from an XML query language.