A normal form for XML documents
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema
ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
Scalable semantic web data management using vertical partitioning
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
YAGO: A Large Ontology from Wikipedia and WordNet
Web Semantics: Science, Services and Agents on the World Wide Web
RDF-3X: a RISC-style engine for RDF
Proceedings of the VLDB Endowment
Prefix based numbering schemes for XML: techniques, applications and performances
Proceedings of the VLDB Endowment
SP^2Bench: A SPARQL Performance Benchmark
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
DBpedia - A crystallization point for the Web of Data
Web Semantics: Science, Services and Agents on the World Wide Web
The RDF-3X engine for scalable management of RDF data
The VLDB Journal — The International Journal on Very Large Data Bases
LUBM: A benchmark for OWL knowledge base systems
Web Semantics: Science, Services and Agents on the World Wide Web
DBpedia SPARQL benchmark: performance assessment with real queries on real data
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
FedBench: a benchmark suite for federated semantic data query processing
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Practical RDF schema reasoning with annotated semantic web data
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Heuristics-based query optimisation for SPARQL
Proceedings of the 15th International Conference on Extending Database Technology
Sharing statistics for SPARQL federation optimization, with emphasis on benchmark quality
ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
SPAM: a SPARQL analysis and manipulation tool
Proceedings of the VLDB Endowment
Exploring dictionary-based semantic relatedness in labeled tree data
Information Sciences: an International Journal
SPLODGE: systematic generation of SPARQL benchmark queries for linked open data
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
SRBench: a streaming RDF/SPARQL benchmark
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Linked stream data processing engines: facts and figures
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part II
Tridex: A lightweight triple index for relational database-based Semantic Web data management
Expert Systems with Applications: An International Journal
Building an efficient RDF store over a relational database
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Large-scale bisimulation of RDF graphs
Proceedings of the Fifth Workshop on Semantic Web Information Management
Hi-index | 0.00 |
The widespread adoption of the Resource Description Framework (RDF) for the representation of both open web and enterprise data is the driving force behind the increasing research interest in RDF data management. As RDF data management systems proliferate, so are benchmarks to test the scalability and performance of these systems under data and workloads with various characteristics. In this paper, we compare data generated with existing RDF benchmarks and data found in widely used real RDF datasets. The results of our comparison illustrate that existing benchmark data have little in common with real data. Therefore any conclusions drawn from existing benchmark tests might not actually translate to expected behaviours in real settings. In terms of the comparison itself, we show that simple primitive data metrics are inadequate to flesh out the fundamental differences between real and benchmark data. We make two contributions in this paper: (1) To address the limitations of the primitive metrics, we introduce intuitive and novel metrics that can indeed highlight the key differences between distinct datasets; (2) To address the limitations of existing benchmarks, we introduce a new benchmark generator with the following novel characteristics: (a) the generator can use any (real or synthetic) dataset and convert it into a benchmark dataset; (b) the generator can generate data that mimic the characteristics of real datasets with user-specified data properties. On the technical side, we formulate the benchmark generation problem as an integer programming problem whose solution provides us with the desired benchmark datasets. To our knowledge, this is the first methodological study of RDF benchmarks, as well as the first attempt on generating RDF benchmarks in a principled way.