Issues in big data testing and benchmarking

  • Authors:
  • Alexander Alexandrov;Christoph Brücke;Volker Markl

  • Affiliations:
  • Technische Universität Berlin, Berlin, Germany;Technische Universität Berlin, Berlin, Germany;Technische Universität Berlin, Berlin, Germany

  • Venue:
  • Proceedings of the Sixth International Workshop on Testing Database Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The academic community and industry are currently researching and building next generation data management systems. These systems are designed to analyze data sets of high volume with high data ingest rates and short response times executing complex data analysis algorithms on data that does not adhere to relational data models. As these big data systems differ from standard relational database systems with respect to data and workloads, the traditional benchmarks used by the database community are insufficient. In this paper, we describe initial solutions and challenges with respect to big data generation, methods for creating realistic, privacy-aware, and arbitrarily scalable data sets, workloads, and benchmarks from real world data. We will in particular discuss why we feel that workloads currently discussed in the testing and benchmarking community do not capture the real complexity of big data and highlight several research challenges with respect to massively-parallel data generation and data characterization.