Mining web sites using adaptive information extraction

Authors:
Alexiei Dingli;Fabio Ciravegna;David Guthrie;Yorick Wilks
Affiliations:
University of Sheffield, Regent Court, Sheffield, UK;University of Sheffield, Regent Court, Sheffield, UK;University of Sheffield, Regent Court, Sheffield, UK;University of Sheffield, Regent Court, Sheffield, UK
Venue:
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Year:
2003

Citing 7
Cited 6

Extracting targeted data from the web

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
On the MSE robustness of batching estimators

Proceedings of the 33nd conference on Winter simulation
MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup

EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
S-CREAM - Semi-automatic CREAtion of Metadata

EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
Managing Reference: Ensuring Referential Integrity of Ontologies for the Semantic Web

EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Adaptive information extraction from text by rule induction and generalisation

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Mining knowledge from text using information extraction

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Ontologies as facilitators for repurposing web documents

International Journal of Human-Computer Studies
Automatic acquisition for sensibility knowledge using co-occurrence relation

International Journal of Computer Applications in Technology
Natural Language Processing as a Foundation of the Semantic Web

Foundations and Trends in Web Science
A data mining based method for web site maintenance

Intelligent Data Analysis
A Document Descriptor Extractor Based on Relevant Expressions

EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Adaptive Information Extraction systems (IES) are currently used by some Semantic Web (SW) annotation tools as support to annotation (Handschuh et al., 2002; Vargas-Vera et al., 2002). They are generally based on fully supervised methodologies requiring fairly intense domain-specific annotation. Unfortunately, selecting representative examples may be difficult and annotations can be incorrect and require time. In this paper we present a methodology that drastically reduce (or even remove) the amount of manual annotation required when annotating consistent sets of pages. A very limited number of user-defined examples are used to bootstrap learning. Simple, high precision (and possibly high recall) IE patterns are induced using such examples, these patterns will then discover more examples which will in turn discover more patterns, etc.