The case for a wide-table approach to manage sparse relational data sets

Authors:
Eric Chu;Jennifer Beckmann;Jeffrey Naughton
Affiliations:
University of Wisconsin-Madison, Madison, WI;Microsoft Corporation, Redmond, WA;University of Wisconsin-Madison, Madison, WI
Venue:
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Year:
2007

Citing 12
Cited 12

Vertical partitioning algorithms for database design

ACM Transactions on Database Systems (TODS)
The case for partial indexes

ACM SIGMOD Record
Data preparation for data mining

Data preparation for data mining
Maximal objects and the semantics of universal relation databases

ACM Transactions on Database Systems (TODS)
Searching with numbers

Proceedings of the 11th international conference on World Wide Web
Storage and Querying of E-Commerce Data

Proceedings of the 27th International Conference on Very Large Data Bases
An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
A comparison of file organization techniques

ACM '69 Proceedings of the 1969 24th national conference
Matchmaking: Distributed Resource Management for High Throughput Computing

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Schema-free XQuery

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

A relational approach to incrementally extracting and querying structure in unstructured data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
RDF-3X: a RISC-style engine for RDF

Proceedings of the VLDB Endowment
Relational support for flexible schema scenarios

Proceedings of the VLDB Endowment
A comparison of flexible schemas for software as a service

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
The RDF-3X engine for scalable management of RDF data

The VLDB Journal — The International Journal on Very Large Data Bases
Relational processing of RDF queries: a survey

ACM SIGMOD Record
CW2I: community data indexing for complex query processing

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Schema-as-you-go: on probabilistic tagging and querying of wide tables

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
FlexTable: using a dynamic relation model to store RDF data

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Indexing dataspaces with partitions

World Wide Web
Examining extended and scientific metadata for scalable index designs

Proceedings of the 6th International Systems and Storage Conference
An index model for multitenant data storage in saas

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

A "sparse" data set typically has hundreds or even thousands of attributes, but most objects have non-null values for only a small number of these attributes. A popular view about sparse data is that it arises merely as the result of poor schema design. In this paper, we argue that rather than being the result of inept schema design,storing a sparse data set in a single table is the right way to proceed. However, for this to be the case, RDBMSs must provide sparse data management facilities that go beyond the previously studied requirement of storing such data sets efficiently. In particular, an RDBMS must 1) enable users to effectively build ad hoc queries over a very large number of attributes, and 2) support efficient evaluation of these queries over a wide, sparse table. We propose techniques that provide these capabilities, and argue that the single-table approach is a necessary component of self-managing database systems because it frees users from a tedious and potentially ineffective schema-design phase when managing sparse data sets.