Run-time performance optimization of a BigData query language

  • Authors:
  • Yanbin Liu;Parijat Dube;Scott C. Gray

  • Affiliations:
  • IBM Watson Research Center, Yorktown Heights, USA;IBM Watson Research Center, Yorktown Heights, USA;IBM Watson Research Center, Yorktown Heights, USA

  • Venue:
  • Proceedings of the 5th ACM/SPEC international conference on Performance engineering
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

JAQL is a query language for large-scale data that connects BigData analytics and MapReduce framework together. Also an IBM product, JAQL's performance is critical for IBM InfoSphere BigInsights, a BigData analytics platform. In this paper, we report our work on improving JAQL performance from multiple perspectives. We explore the parallelism of JAQL, profile JAQL for performance analysis, identify I/O as the dominant performance bottleneck, and improve JAQL performance with an emphasis on reducing I/O data size and increasing (de)serialization efficiency. With TPCH benchmark on a simple Hadoop cluster, we report up to 2x performance improvements in JAQL with our optimization fixes.