Column-stores vs. row-stores: how different are they really?

Authors:
Daniel J. Abadi;Samuel R. Madden;Nabil Hachem
Affiliations:
Yale University, New Haven, CT, USA;MIT, Cambridge, MA, USA;AvantGarde Consulting, LLC, Shrewsbury, MA, USA
Venue:
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Year:
2008

Citing 19
Cited 65

Multi-table joins through bitmapped join indices

ACM SIGMOD Record
On searching transposed files

ACM Transactions on Database Systems (TODS)
Using Semi-Joins to Solve Relational Queries

Journal of the ACM (JACM)
Efficient execution of joins in a star schema

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Volcano— An Extensible and Parallel Query Evaluation System

IEEE Transactions on Knowledge and Data Engineering
A Query Processing Strategy for the Decomposed Storage Model

Proceedings of the Third International Conference on Data Engineering
Block Oriented Processing of Relational Database Operations in Modern Computer Architectures

Proceedings of the 17th International Conference on Data Engineering
Weaving Relations for Cache Performance

Proceedings of the 27th International Conference on Very Large Data Bases
MIL primitives for querying a fragmented world

The VLDB Journal — The International Journal on Very Large Data Bases
Buffering databse operations for enhanced instruction cache performance

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
QPipe: a simultaneously pipelined relational query engine

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Super-Scalar RAM-CPU Cache Compression

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Integrating compression and execution in column-oriented database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Performance tradeoffs in read-optimized databases

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient columnar storage in B-trees

ACM SIGMOD Record
A case for fractured mirrors

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Query execution in column-oriented database systems

Query execution in column-oriented database systems
Adjoined Dimension Column Clustering to Improve Data Warehouse Query Performance

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

Read-Optimized, Cache-Conscious, Page Layouts for Temporal Relational Data

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Read-optimized databases, in depth

Proceedings of the VLDB Endowment
Fast scans and joins using flash drives

Proceedings of the 4th international workshop on Data management on new hardware
DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing

Proceedings of the 4th international workshop on Data management on new hardware
Spyglass: fast, scalable metadata search for large-scale storage systems

FAST '09 Proccedings of the 7th conference on File and storage technologies
SW-Store: a vertically partitioned DBMS for Semantic Web data management

The VLDB Journal — The International Journal on Very Large Data Bases
Query processing techniques for solid state drives

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Adaptive Physical Design for Curated Archives

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
MapReduce and parallel DBMSs: friends or foes?

Communications of the ACM - Amir Pnueli: Ahead of His Time
Column-oriented database systems

Proceedings of the VLDB Endowment
Probabilistic ranking over relations

Proceedings of the 13th International Conference on Extending Database Technology
A plan for OLAP

Proceedings of the 13th International Conference on Extending Database Technology
Sense the physical, walkthrough the virtual, manage the co (existing) spaces: a database perspective

ACM SIGMOD Record
Adaptive query processing in data stream management systems under limited memory resources

PIKM '10 Proceedings of the 3rd workshop on Ph.D. students in information and knowledge management
Supporting web-based visual exploration of large-scale raster geospatial data using binned min-max Quadtree

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
MOSS-DB: a hardware-aware OLAP database

WAIM'10 Proceedings of the 11th international conference on Web-age information management
NetStore: an efficient storage infrastructure for network forensics and monitoring

RAID'10 Proceedings of the 13th international conference on Recent advances in intrusion detection
Database compression on graphics processors

Proceedings of the VLDB Endowment
Cheetah: a high performance, custom data warehouse on top of MapReduce

Proceedings of the VLDB Endowment
Assessing and optimizing microarchitectural performance of event processing systems

TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
Efficient and scalable data evolution with column oriented databases

Proceedings of the 14th International Conference on Extending Database Technology
SLA-tree: a framework for efficiently supporting SLA-based decisions in cloud computing

Proceedings of the 14th International Conference on Extending Database Technology
A Novel Multicontext Coarse-Grained Reconfigurable Architecture (CGRA) For Accelerating Column-Oriented Databases

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Column-oriented storage techniques for MapReduce

Proceedings of the VLDB Endowment
SQL server column store indexes

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
An analytic data engine for visualization in tableau

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Improving performance by creating a native join-index for OLAP

Frontiers of Computer Science in China
GBASE: a scalable and general graph management system

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-core vs. I/O wall: the approaches to conquer and cooperate

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Trojan data layouts: right shoes for a running elephant

Proceedings of the 2nd ACM Symposium on Cloud Computing
A regression testing framework for financial time-series databases: an effective combination of fitnesse, scala, and kdb/q

Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
ISABELA-QA: query-driven analytics with ISABELA-compressed extreme-scale scientific data

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Building cubes with MapReduce

Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP
Column-oriented query processing for row stores

Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP
Aggregation strategies for columnar in-memory databases in a mixed workload

Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
Improving the efficiency of subset queries on raster images

Proceedings of the ACM SIGSPATIAL Second International Workshop on High Performance and Distributed Geographic Information Systems
Progressive processing of subspace dominating queries

The VLDB Journal — The International Journal on Very Large Data Bases
ECOS: evolutionary column-oriented storage

BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Collection and exploration of large data monitoring sets using bitmap databases

TMA'10 Proceedings of the Second international conference on Traffic Monitoring and Analysis
MCJoin: a memory-constrained join for column-store main-memory databases

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
X-device query processing by bitwise distribution

DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Clydesdale: structured data processing on MapReduce

Proceedings of the 15th International Conference on Extending Database Technology
CDDTA-JOIN: one-pass OLAP algorithm for column-oriented databases

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
A flash-based decomposition storage model

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications
An alert correlation platform for memory-supported techniques

Concurrency and Computation: Practice & Experience
Real-time creation of bitmap indexes on streaming network data

The VLDB Journal — The International Journal on Very Large Data Bases
Reordering rows for better compression: Beyond the lexicographic order

ACM Transactions on Database Systems (TODS)
U2SOD-DB: a database system to manage large-scale ubiquitous urban sensing origin-destination data

Proceedings of the ACM SIGKDD International Workshop on Urban Computing
Towards a hybrid row-column database for a cloud-based medical data management system

Proceedings of the 1st International Workshop on Cloud Intelligence
Processing a trillion cells per mouse click

Proceedings of the VLDB Endowment
A methodology for managing database and code changes in a regression testing framework

Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity
gbase: an efficient analysis platform for large graphs

The VLDB Journal — The International Journal on Very Large Data Bases
A positional access method for relational databases

Proceedings of the 21st ACM international conference on Information and knowledge management
Automatic selection of processing units for coprocessing in databases

ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Sliced column-store (SCS): ontological foundations and practical implications

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Indexing dataspaces with partitions

World Wide Web
Schema-less XML in columns

ADC '11 Proceedings of the Twenty-Second Australasian Database Conference - Volume 115
Processing analytical queries over encrypted data

Proceedings of the VLDB Endowment
Ad-hoc aggregate query processing algorithms based on bit-store for query intensive applications in cloud computing

Future Generation Computer Systems
Cache conscious star-join in MapReduce environments

Proceedings of the 2nd International Workshop on Cloud Intelligence
Audience segment expansion using distributed in-database k-means clustering

Proceedings of the Seventh International Workshop on Data Mining for Online Advertising
Keyword oriented bitmap join index for in-memory analytical processing

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
The Yin and Yang of processing data warehousing queries on GPU devices

Proceedings of the VLDB Endowment
Design and evaluation of storage organizations for read-optimized main memory databases

Proceedings of the VLDB Endowment
Ultrawrap: SPARQL execution on relational data

Web Semantics: Science, Services and Agents on the World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been a significant amount of excitement and recent work on column-oriented database systems ("column-stores"). These database systems have been shown to perform more than an order of magnitude better than traditional row-oriented database systems ("row-stores") on analytical workloads such as those found in data warehouses, decision support, and business intelligence applications. The elevator pitch behind this performance difference is straightforward: column-stores are more I/O efficient for read-only queries since they only have to read from disk (or from memory) those attributes accessed by a query. This simplistic view leads to the assumption that one can obtain the performance benefits of a column-store using a row-store: either by vertically partitioning the schema, or by indexing every column so that columns can be accessed independently. In this paper, we demonstrate that this assumption is false. We compare the performance of a commercial row-store under a variety of different configurations with a column-store and show that the row-store performance is significantly slower on a recently proposed data warehouse benchmark. We then analyze the performance difference and show that there are some important differences between the two systems at the query executor level (in addition to the obvious differences at the storage layer level). Using the column-store, we then tease apart these differences, demonstrating the impact on performance of a variety of column-oriented query execution techniques, including vectorized query processing, compression, and a new join algorithm we introduce in this paper. We conclude that while it is not impossible for a row-store to achieve some of the performance advantages of a column-store, changes must be made to both the storage layer and the query executor to fully obtain the benefits of a column-oriented approach.