From HTML documents to web tables and rules

Authors:
Kai Simon;Georg Lausen;Harold Boley
Affiliations:
Universität Freiburg, Freiburg i.Br., Germany;Universität Freiburg, Freiburg i.Br., Germany;Institute for Information Technology -- e-Business, Fredericton, NB, Canada
Venue:
ICEC '06 Proceedings of the 8th international conference on Electronic commerce: The new e-commerce: innovations for conquering current barriers, obstacles and limitations to conducting successful business on the internet
Year:
2006

Citing 9
Cited 0

Logical foundations of object-oriented and frame-based languages

Journal of the ACM (JACM)
A brief survey of web data extraction tools

ACM SIGMOD Record
FLORID: A Prototype for F-Logic

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
The eXtensible Rule Markup Language

Communications of the ACM - Wireless networking security
CORDS: automatic discovery of correlations and soft functional dependencies

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Mining approximate functional dependencies and concept similarities to answer imprecise queries

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
ViPER: augmenting automatic information extraction with visual perceptions

Proceedings of the 14th ACM international conference on Information and knowledge management
BHUNT: automatic discovery of Fuzzy algebraic constraints in relational data

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
The OO jDREW reference implementation of RuleML

RuleML'05 Proceedings of the First international conference on Rules and Rule Markup Languages for the Semantic Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a browser-extending Semantic Web extraction system that maps HTML documents to tables and, where possible, to rules. First, the basic data extractor ViPER distills and reorganizes semi-structured information into a tabular data structure, which can again be browsed and/or submitted to further machine processing. Second, exemplifying the latter, the extended knowledge extractor Rex ViPER mines the resulting tables for structural properties and functional dependencies. Rules are generated to obtain a more compact and manageable, often also enriched, knowledge representation. The resulting fully structured information, RuleML-serialized facts and rules, can be stored along with the orginal documents, queried by rule engines such as OO jDREW and FLORID, and interchanged between Web Services. Thus Rex ViPER contributes to automating the construction of a machine-processable Semantic Web.