Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format

Authors:
Jennifer L. Beckmann;Alan Halverson;Rajasekar Krishnamurthy;Jeffrey F. Naughton
Affiliations:
University of Wisconsin;University of Wisconsin;IBM Research-Almaden;University of Wisconsin
Venue:
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Year:
2006

Citing 0
Cited 20

Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A relational approach to incrementally extracting and querying structure in unstructured data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Multi-tenant databases for software as a service: schema-mapping techniques

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
Relational support for flexible schema scenarios

Proceedings of the VLDB Endowment
Pivoted table index for querying product-property-value information

Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
Efficient Storage and Querying of Horizontal Tables Using a PIVOT Operation in Commercial Relational DBMSs

IEICE - Transactions on Information and Systems
SW-Store: a vertically partitioned DBMS for Semantic Web data management

The VLDB Journal — The International Journal on Very Large Data Bases
A comparison of flexible schemas for software as a service

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Incremental aggregation of RFID data

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Relational processing of RDF queries: a survey

ACM SIGMOD Record
Efficient set-correlation operator inside databases

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
CW2I: community data indexing for complex query processing

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Lessons learned from DB2 pureXML applications: a practitioner's perspective

XSym'10 Proceedings of the 7th international XML database conference on Database and XML technologies
Native support of multi-tenancy in RDBMS for software as a service

Proceedings of the 14th International Conference on Extending Database Technology
A scalable and extensible framework for query answering over RDF

World Wide Web
ISIS: a new approach for efficient similarity search in sparse databases

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
3SEPIAS: A Semi-Structured Search Engine for Personal Information in dAtaspace System

Information Sciences: an International Journal
Indexing dataspaces with partitions

World Wide Web
An index model for multitenant data storage in saas

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

"Sparse" data, in which relations have many attributes that are null for most tuples, presents a challenge for relational database management systems. If one uses the normal "horizontal" schema to store such data sets in any of the three leading commercial RDBMS, the result is tables that occupy vast amounts of storage, most of which is devoted to nulls. If one attempts to avoid this storage blowup by using a "vertical" schema, the storage utilization is indeed better, but query performance is orders of magnitude slower for certain classes of queries. In this paper, we argue that the proper way to handle sparse data is not to use a vertical schema, but rather to extend the RDBMS tuple storage format to allow the representation of sparse attributes as interpreted fields. The addition of interpreted storage allows for efficient and transparent querying of sparse data, uniform access to all attributes, and schema scalability. We show, through an implementation in PostgreSQL, that the interpreted storage approach dominates in query efficiency and ease-of-use over the current horizontal storage and vertical schema approaches over a wide range of queries and sparse data sets.