User-driven quality evaluation of DBpedia

Authors:
Amrapali Zaveri;Dimitris Kontokostas;Mohamed A. Sherif;Lorenz Bühmann;Mohamed Morsey;Sören Auer;Jens Lehmann
Affiliations:
AKSW/BIS, Universität Leipzig, Leipzig, Germany;AKSW/BIS, Universität Leipzig, Leipzig, Germany;AKSW/BIS, Universität Leipzig, Leipzig, Germany;AKSW/BIS, Universität Leipzig, Leipzig, Germany;AKSW/BIS, Universität Leipzig, Leipzig, Germany;CS/EIS, Universität Bonn, Bonn, Germany;AKSW/BIS, Universität Leipzig, Leipzig, Germany
Venue:
Proceedings of the 9th International Conference on Semantic Systems
Year:
2013

Citing 11
Cited 1

What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content

ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
WebTables: exploring the power of tables on the web

Proceedings of the VLDB Endowment
Quality-driven information filtering using the WIQA policy framework

Web Semantics: Science, Services and Agents on the World Wide Web
DBpedia - A crystallization point for the Web of Data

Web Semantics: Science, Services and Agents on the World Wide Web
Concept learning in description logics using refinement operators

Machine Learning
DL-Learner: Learning Concepts in Description Logics

The Journal of Machine Learning Research
ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking

Proceedings of the 21st international conference on World Wide Web
An empirical survey of Linked Data conformance

Web Semantics: Science, Services and Agents on the World Wide Web
Assessing linked data mappings using network measures

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
CrowdER: crowdsourcing entity resolution

Proceedings of the VLDB Endowment
CrowdMap: crowdsourcing ontology alignment with microtasks

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I

Test-driven evaluation of linked data quality

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Linked Open Data (LOD) comprises of an unprecedented volume of structured datasets on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowdsourced and even extracted data of relatively low quality. We present a methodology for assessing the quality of linked data resources, which comprises of a manual and a semi-automatic process. The first phase includes the detection of common quality problems and their representation in a quality problem taxonomy. In the manual process, the second phase comprises of the evaluation of a large number of individual resources, according to the quality problem taxonomy via crowdsourcing. This process is accompanied by a tool wherein a user assesses an individual resource and evaluates each fact for correctness. The semi-automatic process involves the generation and verification of schema axioms. We report the results obtained by applying this methodology to DBpedia. We identified 17 data quality problem types and 58 users assessed a total of 521 resources. Overall, 11.93% of the evaluated DBpedia triples were identified to have some quality issues. Applying the semi-automatic component yielded a total of 222,982 triples that have a high probability to be incorrect. In particular, we found that problems such as object values being incorrectly extracted, irrelevant extraction of information and broken links were the most recurring quality problems. With this study, we not only aim to assess the quality of this sample of DBpedia resources but also adopt an agile methodology to improve the quality in future versions by regularly providing feedback to the DBpedia maintainers.