Efficient and scalable data evolution with column oriented databases

Authors:
Ziyang Liu;Bin He;Hui-I Hsiao;Yi Chen
Affiliations:
Arizona State University;IBM Almaden Research Center;IBM Almaden Research Center;Arizona State University
Venue:
Proceedings of the 14th International Conference on Extending Database Technology
Year:
2011

Citing 28
Cited 2

Improved query performance with variant indexes

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The Clio project: managing heterogeneity

ACM SIGMOD Record
Database Management Systems

Database Management Systems
Temporal and Real-Time Databases: A Survey

IEEE Transactions on Knowledge and Data Engineering
Schema evolution in data warehouses

Knowledge and Information Systems
Algorithms for data migration with cloning

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Byte-aligned bitmap compression

DCC '95 Proceedings of the Conference on Data Compression
Data migration to minimize the total completion time

Journal of Algorithms
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Semantic adaptation of schema mappings when schemas evolve

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Optimizing bitmap indices with efficient compression

ACM Transactions on Database Systems (TODS)
Integrating compression and execution in column-oriented database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Performance tradeoffs in read-optimized databases

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Database application evolution: a transformational approach

Data & Knowledge Engineering - Special issue: ER 2003
Making database systems usable

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Mapping adaptation under evolving schemas

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
On the performance of bitmap indices for high cardinality attributes

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Column-stores vs. row-stores: how different are they really?

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Graceful database schema evolution: the PRISM workbench

Proceedings of the VLDB Endowment
Managing and querying transaction-time databases under schema evolution

Proceedings of the VLDB Endowment
Automating database schema evolution in information system upgrades

Proceedings of the 2nd International Workshop on Hot Topics in Software Upgrades
SIMPLE: A Strategic Information Mining Platform for Licensing and Execution

ICDMW '09 Proceedings of the 2009 IEEE International Conference on Data Mining Workshops
Position list word aligned hybrid: optimizing space and performance for compressed bitmaps

Proceedings of the 13th International Conference on Extending Database Technology
CODS: evolving data efficiently and scalably in column oriented databases

Proceedings of the VLDB Endowment
Relational schema evolution for program independency

CIT'04 Proceedings of the 7th international conference on Intelligent Information Technology
Improved algorithms for data migration

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Co-transformations in database applications evolution

GTTSE'05 Proceedings of the 2005 international conference on Generative and Transformational Techniques in Software Engineering

Automating the database schema evolution process

The VLDB Journal — The International Journal on Very Large Data Bases
Slowly changing measures

Proceedings of the sixteenth international workshop on Data warehousing and OLAP

Quantified Score

Hi-index	0.00

Visualization

Abstract

Database evolution is the process of updating the schema of a database or data warehouse (schema evolution) and evolving the data to the updated schema (data evolution). It is often desired or necessitated when changes occur to the data or the query workload, the initial schema was not carefully designed, or more knowledge of the database is known and a better schema is concluded. The Wikipedia database, for example, has had more than 170 versions in the past 5 years [8]. Unfortunately, although much research has been done on the schema evolution part, data evolution has long been a prohibitively expensive process, which essentially evolves the data by executing SQL queries and re-constructing indexes. This prevents databases from being flexibly and frequently changed based on the need and forces schema designers, who cannot afford mistakes, to be highly cautious. Techniques that enable efficient data evolution will undoubtedly make life much easier. In this paper, we study the efficiency of data evolution, and discuss the techniques for data evolution on column oriented databases, which store each attribute, rather than each tuple, contiguously. We show that column oriented databases have a better potential than traditional row oriented databases for supporting data evolution, and propose a novel data-level data evolution framework on column oriented databases. Our approach, as suggested by experimental evaluations on real and synthetic data, is much more efficient than the query-level data evolution on both row and column oriented databases, which involves unnecessary access of irrelevant data, materializing intermediate results and re-constructing indexes.