Integrating semi-structured data into business applications: a web intelligence example

Authors:
Robert Baumgartner;Oliver Frölich;Georg Gottlob;Marcus Herzog;Peter Lehmann
Affiliations:
DBAI, Institute for Information Systems, Vienna Technical University, Vienna, Austria;DBAI, Institute for Information Systems, Vienna Technical University, Vienna, Austria;DBAI, Institute for Information Systems, Vienna Technical University, Vienna, Austria;DBAI, Institute for Information Systems, Vienna Technical University, Vienna, Austria;Department of Information and Communication, Hochschule der Medien, Fachhochschule Stuttgart, Stuttgart, Germany
Venue:
WM'05 Proceedings of the Third Biennial conference on Professional Knowledge Management
Year:
2005

Citing 17
Cited 1

Cut and paste

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A hierarchical approach to wrapper induction

Proceedings of the third annual conference on Autonomous Agents
Computational aspects of resilient data extraction from semistructured sources (extended abstract)

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A brief survey of web data extraction tools

ACM SIGMOD Record
Wrapper verification

World Wide Web
Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Visual Web Information Extraction with Lixto

Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
Social Networks on the Web and in the Enterprise

WI '01 Proceedings of the First Asia-Pacific Conference on Web Intelligence: Research and Development
The Wargo System: Semi-Automatic Wrapper Generation in Presence of Complex Data Access Modes

DEXA '02 Proceedings of the 13th International Workshop on Database and Expert Systems Applications
Jedi: Extracting and Synthesizing Information from the Web

COOPIS '98 Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems
InfoPipes: A Flexible Framework for M-Commerce Applications

TES '01 Proceedings of the Second International Workshop on Technologies for E-Services
Wiccap Data Model: Mapping Physical Websites to Logical Views

ER '02 Proceedings of the 21st International Conference on Conceptual Modeling
A Unified Framework for Wrapping, Mediating and Restructuring Information from the Web

ER '99 Proceedings of the Workshops on Evolution and Change in Data Management, Reverse Engineering in Information Systems, and the World Wide Web and Conceptual Modeling
Monadic datalog and the expressive power of languages for Web information extraction

Journal of the ACM (JACM)
Distributed WWW programming using (Ciao-)Prolog and the PiLLoW library

Theory and Practice of Logic Programming
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web

Theorem prover approach to semistructured data design

Formal Methods in System Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

The World Wide Web, representing a universe of knowledge, provides public domain information about market developments and competitor activities on the market. This information is becoming more and more a critical success factor for enterprises and can be retrieved for example from Web sites or online shops. The extraction from these semi-structured information sources is mostly done manually and is very time consuming. Therefore, powerful and user-friendly tools for extracting and integrating information from various different Web sources, or in general, various heterogeneous semi-structured data sources are needed. In this paper we describe a solution how data from public information sources, in particular from the World Wide Web, can be retrieved and normalized to structured data formats automatically. We also illustrate how this data can be automatically integrated afterwards in – often complex – Web Intelligence applications.