Query processing techniques for solid state drives

Authors:
Dimitris Tsirogiannis;Stavros Harizopoulos;Mehul A. Shah;Janet L. Wiener;Goetz Graefe
Affiliations:
University of Toronto, Toronto, ON, Canada;HP Labs, Palo Alto, CA, USA;Hewlett Packard Laboratories, Palo Alto, CA, USA;Hewlett Packard Laboratories, Palo Alto, CA, USA;Hewlett Packard Laboratories, Palo Alto, CA, USA
Venue:
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Year:
2009

Citing 20
Cited 33

Join processing in database systems with large main memories

ACM Transactions on Database Systems (TODS)
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
TID hash joins

CIKM '94 Proceedings of the third international conference on Information and knowledge management
A decomposition storage model

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
Implementation techniques for main memory database systems

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Data page layouts for relational databases on deep memory hierarchies

The VLDB Journal — The International Journal on Very Large Data Bases
Integrating Semi-Join-Reducers into State of the Art Query Processors

Proceedings of the 17th International Conference on Data Engineering
Fast joins using join indices

The VLDB Journal — The International Journal on Very Large Data Bases
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Performance tradeoffs in read-optimized databases

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Design of flash-based DBMS: an in-page logging approach

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Column-stores vs. row-stores: how different are they really?

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A case for flash memory ssd in enterprise database applications

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
The Five-Minute Rule 20 Years Later: and How Flash Memory Changes the Rules

Queue - Enterprise Flash Storage
Read-optimized databases, in depth

Proceedings of the VLDB Endowment
Flashing up the storage layer

Proceedings of the VLDB Endowment
Online maintenance of very large random samples on flash storage

Proceedings of the VLDB Endowment
Modeling the performance of algorithms on flash memory devices

Proceedings of the 4th international workshop on Data management on new hardware
Fast scans and joins using flash drives

Proceedings of the 4th international workshop on Data management on new hardware
Tree Indexing on Flash Disks

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering

Join processing for flash SSDs: remembering past lessons

Proceedings of the Fifth International Workshop on Data Management on New Hardware
FAWN: a fast array of wimpy nodes

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Column-oriented database systems

Proceedings of the VLDB Endowment
PR-join: a non-blocking join achieving higher early result rate with statistical guarantees

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Pay-as-you-go: an adaptive approach to provide full context-aware text search over document content

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Enhancing energy efficiency of database applications using SSDs

Proceedings of the Third C* Conference on Computer Science and Software Engineering
A development environment for query optimizers

Proceedings of the Third International Workshop on Testing Database Systems
Vertical partitioning for flash and HDD database systems

Journal of Systems and Software
On the impact of flash SSDs on spatial indexing

Proceedings of the Sixth International Workshop on Data Management on New Hardware
Flashing databases: expectations and limitations

Proceedings of the Sixth International Workshop on Data Management on New Hardware
StableBuffer: optimizing write performance for DBMS applications on flash devices

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Towards efficient concurrent scans on flash disks

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Tree indexing on solid state drives

Proceedings of the VLDB Endowment
Using solid state drives as a mid-tier cache in enterprise database OLTP applications

TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
CAFTL: a content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives

FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Architectural Requirements for Cloud Computing Systems: An Enterprise Cloud Approach

Journal of Grid Computing
Operation-aware buffer management in flash-based systems

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Turbocharging DBMS buffer pool using SSDs

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Data management over flash memory

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A novel method to extend flash memory lifetime in flash-based DBMS

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Report on the first international workshop on flash-based database systems (FlashDB 2011)

ACM SIGMOD Record
Column-oriented query processing for row stores

Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP
Towards cost-effective storage provisioning for DBMSs

Proceedings of the VLDB Endowment
Improving database performance using a flash-based write cache

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications
A flash-based decomposition storage model

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications
SI-CV: snapshot isolation with co-located versions

TPCTC'11 Proceedings of the Third TPC Technology conference on Topics in Performance Evaluation, Measurement and Characterization
Query processing on smart SSDs: opportunities and challenges

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Investigating hybrid SSD FTL schemes for Hadoop workloads

Proceedings of the ACM International Conference on Computing Frontiers
The impact of solid state drive on search engine cache management

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Scan and join optimization by exploiting internal parallelism of flash-based solid state drives

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
MixSL: an efficient transaction recovery model in flash-based DBMS

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Can SSDs help reduce random i/os in hash joins?

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Scalable multi-access flash store for big data analytics

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

Solid state drives perform random reads more than 100x faster than traditional magnetic hard disks, while offering comparable sequential read and write bandwidth. Because of their potential to speed up applications, as well as their reduced power consumption, these new drives are expected to gradually replace hard disks as the primary permanent storage media in large data centers. However, although they may benefit applications that stress random reads immediately, they may not improve database applications, especially those running long data analysis queries. Database query processing engines have been designed around the speed mismatch between random and sequential I/O on hard disks and their algorithms currently emphasize sequential accesses for disk-resident data. In this paper, we investigate data structures and algorithms that leverage fast random reads to speed up selection, projection, and join operations in relational query processing. We first demonstrate how a column-based layout within each page reduces the amount of data read during selections and projections. We then introduce FlashJoin, a general pipelined join algorithm that minimizes accesses to base and intermediate relational data. FlashJoin's binary join kernel accesses only the join attributes, producing partial results in the form of a join index. Subsequently, its fetch kernel retrieves the attributes for later nodes in the query plan as they are needed. FlashJoin significantly reduces memory and I/O requirements for each join in the query. We implemented these techniques inside Postgres and experimented with an enterprise SSD drive. Our techniques improved query runtimes by up to 6x for queries ranging from simple relational scans and joins to full TPC-H queries.