Content annotation for the semantic web: an automatic web-based approach

  • Authors:
  • David Sánchez;David Isern;Miquel Millan

  • Affiliations:
  • Universitat Rovira i Virgili, Departament d’Enginyeria Informàtica i Matemàtiques, Intelligent Technologies for Advanced Knowledge Acquisition Research Group (ITAKA), Av Pa&#x ...;Universitat Rovira i Virgili, Departament d’Enginyeria Informàtica i Matemàtiques, Intelligent Technologies for Advanced Knowledge Acquisition Research Group (ITAKA), Av Pa&#x ...;Universitat Rovira i Virgili, Departament d’Enginyeria Informàtica i Matemàtiques, Intelligent Technologies for Advanced Knowledge Acquisition Research Group (ITAKA), Av Pa&#x ...

  • Venue:
  • Knowledge and Information Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Semantic Annotation is required to add machine-readable content to natural language text. A global initiative such as the Semantic Web directly depends on the annotation of massive amounts of textual Web resources. However, considering the amount of those resources, a manual semantic annotation of their contents is neither feasible nor scalable. In this paper we introduce a methodology to partially annotate textual content of Web resources in an automatic and unsupervised way. It uses several well-established learning techniques and heuristics to discover relevant entities in text and to associate them to classes of an input ontology by means of linguistic patterns. It also relies on the Web information distribution to assess the degree of semantic co-relation between entities and classes of the input domain ontology. Special efforts have been put in minimizing the amount of Web accesses required to evaluate entities in order to ensure the scalability of the approach. A manual evaluation has been carried out to test the methodology for several domains showing promising results.