A practical experience concerning the parallel semantic annotation of a large-scale data collection

  • Authors:
  • Javier Fabra;Sergio Hernández;Pedro Álvarez;Estefanía Otero;Juan Carlos Vidal;Manuel Lama

  • Affiliations:
  • Universidad de Zaragoza, Spain;Universidad de Zaragoza, Spain;Universidad de Zaragoza, Spain;Universidade de Santiago de Compostela, Spain;Universidade de Santiago de Compostela, Spain;Universidade de Santiago de Compostela, Spain

  • Venue:
  • Proceedings of the 9th International Conference on Semantic Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

From a computational point of view, the semantic annotation of large-scale data collections is an extremely expensive task. One possible way of dealing with this drawback is to distribute the execution of the annotation algorithm in several computing environments. In this paper, we show how the problem of semantically annotating a large-scale collection of learning objects has been conducted. The terms related to each learning object have been processed. The output was an RDF graph computed from the DBpedia database. According to an initial study, the use of a sequential implementation of the annotation algorithm would require more than 1600 CPU-years to deal with the whole set of learning objects (about 15 millions). For this reason, a framework able to integrate a set of heterogeneous computing infrastructures has been used to execute a new parallel version of the algorithm. As a result, the problem was solved in 178 days.