A practical experience concerning the parallel semantic annotation of a large-scale data collection

Authors:
Javier Fabra;Sergio Hernández;Pedro Álvarez;Estefanía Otero;Juan Carlos Vidal;Manuel Lama
Affiliations:
Universidad de Zaragoza, Spain;Universidad de Zaragoza, Spain;Universidad de Zaragoza, Spain;Universidade de Santiago de Compostela, Spain;Universidade de Santiago de Compostela, Spain;Universidade de Santiago de Compostela, Spain
Venue:
Proceedings of the 9th International Conference on Semantic Systems
Year:
2013

Citing 12
Cited 0

Linda in context

Communications of the ACM
Solving the grid interoperability problem by P-GRADE portal at workflow level

Future Generation Computer Systems
Towards Large Scale Semantic Annotation Built on MapReduce Architecture

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
Media Meets Semantic Web --- How the BBC Uses DBpedia and Linked Data to Make Connections

ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
DBpedia - A crystallization point for the Web of Data

Web Semantics: Science, Services and Agents on the World Wide Web
RelFinder: Revealing Relationships in RDF Knowledge Bases

SAMT '09 Proceedings of the 4th International Conference on Semantic and Digital Media Technologies: Semantic Multimedia
GMBS: A new middleware service for making grids interoperable

Future Generation Computer Systems
Empowering automatic semantic annotation in grid

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Linkator: enriching web pages by automatically adding dereferenceable semantic annotations

ICWE'10 Proceedings of the 10th international conference on Web engineering
DBpedia spotlight: shedding light on the web of documents

Proceedings of the 7th International Conference on Semantic Systems
A Distributed Workflow Management System with Case Study of Real-life Scientific Applications on Grids

Journal of Grid Computing
GJMF - a composable service-oriented grid job management framework

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

From a computational point of view, the semantic annotation of large-scale data collections is an extremely expensive task. One possible way of dealing with this drawback is to distribute the execution of the annotation algorithm in several computing environments. In this paper, we show how the problem of semantically annotating a large-scale collection of learning objects has been conducted. The terms related to each learning object have been processed. The output was an RDF graph computed from the DBpedia database. According to an initial study, the use of a sequential implementation of the annotation algorithm would require more than 1600 CPU-years to deal with the whole set of learning objects (about 15 millions). For this reason, a framework able to integrate a set of heterogeneous computing infrastructures has been used to execute a new parallel version of the algorithm. As a result, the problem was solved in 178 days.