Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
MapReduce: a flexible data processing tool
Communications of the ACM - Amir Pnueli: Ahead of His Time
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
FLEX: a slot allocation scheduling optimizer for MapReduce workloads
Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
Comparing high level mapreduce query languages
APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Adaptive MapReduce using situation-aware mappers
Proceedings of the 15th International Conference on Extending Database Technology
Same Queries, Different Data: Can We Predict Runtime Performance?
ICDEW '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering Workshops
A platform for eXtreme analytics
IBM Journal of Research and Development
Hi-index | 0.00 |
JAQL is a query language for large-scale data that connects BigData analytics and MapReduce framework together. Also an IBM product, JAQL's performance is critical for IBM InfoSphere BigInsights, a BigData analytics platform. In this paper, we report our work on improving JAQL performance from multiple perspectives. We explore the parallelism of JAQL, profile JAQL for performance analysis, identify I/O as the dominant performance bottleneck, and improve JAQL performance with an emphasis on reducing I/O data size and increasing (de)serialization efficiency. With TPCH benchmark on a simple Hadoop cluster, we report up to 2x performance improvements in JAQL with our optimization fixes.