A fast and high throughput SQL query system for big data

Authors:
Feng Zhu;Jie Liu;Lijie Xu
Affiliations:
Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China;Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China;Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China
Venue:
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Year:
2012

Citing 1
Cited 1

MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6

On benchmarking online social media analytical queries

First International Workshop on Graph Data Management Experiences and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Relational data query always plays an important role in data analysis. But how to scale out the traditional SQL query system is a challenging problem. In this paper, we introduce a fast, high throughput and scalable system to perform read-only SQL well with the advantage of NoSQL's distributed architecture. We adopt HBase as the storage layer and design a distributed query engine (DQE) collaborating with it to perform SQL queries. Our system also contains distinctive index and cache mechanisms to accelerate query processing. Finally, we evaluate our system with real-world big data crawled from Sina Weibo and it achieves good performance under nineteen representative SQL queries.