DigestJoin: Exploiting Fast Random Reads for Flash-Based Joins

Authors:
Yu Li;Sai Tung On;Jianliang Xu;Byron Choi;Haibo Hu
Affiliations:
-;-;-;-;-
Venue:
MDM '09 Proceedings of the 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware
Year:
2009

Citing 0
Cited 7

StableBuffer: optimizing write performance for DBMS applications on flash devices

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
ESQP: an efficient SQL query processing for cloud data management

CloudDB '10 Proceedings of the second international workshop on Cloud data management
Report on the first international workshop on flash-based database systems (FlashDB 2011)

ACM SIGMOD Record
DigestJoin: expediting joins on solid-state drives

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
Improving database performance using a flash-based write cache

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications
Scan and join optimization by exploiting internal parallelism of flash-based solid state drives

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Can SSDs help reduce random i/os in hash joins?

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Flash disks have been an emerging secondary storage media. In particular, there have been portable devices, multimedia players and laptop computers that are configured with no magnetic disks but flash disks.It is envisioned that some RDBMSs will operate on flash disks in the near future. However, the I/O characteristics of flash disks are different from those of magnetic disks. Thus, in this paper,we study the core of query processing in RDBMSs --- join processing --- on flash disks. Specifically, we propose a new join method, called DigestJoin, to exploit fast random reads of flashdisks. DigestJoin consists of two phases: (1) projecting the join attributes followed by a join on the projected attributes; and (2)fetching the full tuples that satisfy the join to produce the final join results. While the problem of tuple/page fetching with minimum I/O cost (in the second phase) is intractable, we propose three heuristic fetching strategies. We have implemented DigestJoin on a real flash disk for performance evaluation.Experiments on TPC-H datasets show that DigestJoin clearly outperforms the traditional sort-merge join under various system configurations.