DBpedia SPARQL benchmark: performance assessment with real queries on real data

Authors:
Mohamed Morsey;Jens Lehmann;Sören Auer;Axel-Cyrille Ngonga Ngomo
Affiliations:
Department of Computer Science, University of Leipzig, Leipzig, Germany;Department of Computer Science, University of Leipzig, Leipzig, Germany;Department of Computer Science, University of Leipzig, Leipzig, Germany;Department of Computer Science, University of Leipzig, Leipzig, Germany
Venue:
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Year:
2011

Citing 12
Cited 7

Benchmark Handbook: For Database and Transaction Processing Systems

Benchmark Handbook: For Database and Transaction Processing Systems
Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema

ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
Bio2RDF: Towards a mashup to build bioinformatics knowledge systems

Journal of Biomedical Informatics
BorderFlow: A Local Graph Clustering Algorithm for Natural Language Processing

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
SP^2Bench: A SPARQL Performance Benchmark

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Benchmarking Fulltext Search Performance of RDF Stores

ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
DBpedia - A crystallization point for the Web of Data

Web Semantics: Science, Services and Agents on the World Wide Web
LinkedGeoData: Adding a Spatial Dimension to the Web of Data

ISWC '09 Proceedings of the 8th International Semantic Web Conference
LUBM: A benchmark for OWL knowledge base systems

Web Semantics: Science, Services and Agents on the World Wide Web
Apples and oranges: a comparison of RDF benchmarks and real RDF datasets

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
OWLIM: A family of scalable semantic repositories

Semantic Web
LIMES: a time-efficient approach for large-scale link discovery on the web of data

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three

Sharing statistics for SPARQL federation optimization, with emphasis on benchmark quality

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
SPLODGE: systematic generation of SPARQL benchmark queries for linked open data

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Building an efficient RDF store over a relational database

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Database research challenges and opportunities of big graph data

BNCOD'13 Proceedings of the 29th British National conference on Big Data
Introduction to linked data and its lifecycle on the web

RW'13 Proceedings of the 9th international conference on Reasoning Web: semantic technologies for intelligent data access
Review article: User interfaces for semantic authoring of textual content: A systematic literature review

Web Semantics: Science, Services and Agents on the World Wide Web
SPARQL Endpoint Metrics for Quality-Aware Linked Data Consumption

Proceedings of International Conference on Information Integration and Web-based Applications & Services

Quantified Score

Hi-index	0.00

Visualization

Abstract

Triple stores are the backbone of increasingly many Data Web applications. It is thus evident that the performance of those stores is mission critical for individual projects as well as for data integration on the Data Web in general. Consequently, it is of central importance during the implementation of any of these applications to have a clear picture of the weaknesses and strengths of current triple store implementations. In this paper, we propose a generic SPARQL benchmark creation procedure, which we apply to the DBpedia knowledge base. Previous approaches often compared relational and triple stores and, thus, settled on measuring performance against a relational database which had been converted to RDF by using SQL-like queries. In contrast to those approaches, our benchmark is based on queries that were actually issued by humans and applications against existing RDF data not resembling a relational schema. Our generic procedure for benchmark creation is based on query-log mining, clustering and SPARQL feature analysis. We argue that a pure SPARQL benchmark is more useful to compare existing triple stores and provide results for the popular triple store implementations Virtuoso, Sesame, Jena-TDB, and BigOWLIM. The subsequent comparison of our results with other benchmark results indicates that the performance of triple stores is by far less homogeneous than suggested by previous benchmarks.