SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
The BUCKY object-relational benchmark
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Programming pearls (2nd ed.)
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
XMark: a benchmark for XML data management
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Why you should run TPC-DS: a workload analysis
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Proceedings of the VLDB Endowment
Principles for an ETL Benchmark
Performance Evaluation and Benchmarking
FlumeJava: easy, efficient data-parallel pipelines
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Benchmarking cloud serving systems with YCSB
Proceedings of the 1st ACM symposium on Cloud computing
A data generator for cloud-scale benchmarking
TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
EXRT: towards a simple benchmark for XML readiness testing
TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
YCSB++: benchmarking and performance debugging advanced features in scalable table stores
Proceedings of the 2nd ACM Symposium on Cloud Computing
Efficient update data generation for DBMS benchmarks
ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Solving big data challenges for enterprise application performance management
Proceedings of the VLDB Endowment
DeepSea: self-adaptive data partitioning and replication in scalable distributed data systems
Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Can we analyze big data inside a DBMS?
Proceedings of the sixteenth international workshop on Data warehousing and OLAP
A multidimensional data model for TPC-DS benchmarking
Proceedings of the 5th Asia-Pacific Symposium on Internetware
Hi-index | 0.00 |
There is a tremendous interest in big data by academia, industry and a large user base. Several commercial and open source providers unleashed a variety of products to support big data storage and processing. As these products mature, there is a need to evaluate and compare the performance of these systems. In this paper, we present BigBench, an end-to-end big data benchmark proposal. The underlying business model of BigBench is a product retailer. The proposal covers a data model and synthetic data generator that addresses the variety, velocity and volume aspects of big data systems containing structured, semi-structured and unstructured data. The structured part of the BigBench data model is adopted from the TPC-DS benchmark, which is enriched with semi-structured and unstructured data components. The semi-structured part captures registered and guest user clicks on the retailer's website. The unstructured data captures product reviews submitted online. The data generator designed for BigBench provides scalable volumes of raw data based on a scale factor. The BigBench workload is designed around a set of queries against the data model. From a business prospective, the queries cover the different categories of big data analytics proposed by McKinsey. From a technical prospective, the queries are designed to span three different dimensions based on data sources, query processing types and analytic techniques. We illustrate the feasibility of BigBench by implementing it on the Teradata Aster Database. The test includes generating and loading a 200 Gigabyte BigBench data set and testing the workload by executing the BigBench queries (written using Teradata Aster SQL-MR) and reporting their response times.