A framework for populating ontological models from semi-structured web documents

  • Authors:
  • Hassan A. Sleiman;Inma Hernández

  • Affiliations:
  • University of Sevilla, Spain;University of Sevilla, Spain

  • Venue:
  • ER'12 Proceedings of the 31st international conference on Conceptual Modeling
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Web is the largest repository of information that has ever existed. This information is presented in a human friendly format using HTML, which complicates the consumption of this information by automatic processes. Solutions to this problem are the Semantic Web and Web Services, but the lack of such services in the majority of web sites has increased the interest on information extraction, which allow extracting and structuring information from web documents in ontological models. Despite the high number of proposals on information extraction, there does not exist a universally applicable information extractor. As a consequence, when populating an ontology model automatically from a web site, it is not unusual to need more than one information extractor. We propose a framework that allows the development, training, and the application of information extractors on semi-structured web documents to produce semantic data. We have developed a version of the framework and verified it by means of experiments on 15 web sites. Experimental results are very promising.