Data management support for statistical data editing and subset selection

Authors:
Robert A. Burnett;James J. Thomas
Affiliations:
-;-
Venue:
SSDBM'81 Proceedings of the 1st LBL Workshop on Statistical database management
Year:
1981

Citing 2
Cited 8

On searching transposed files

ACM Transactions on Database Systems (TODS)
Operating system support for database management

Communications of the ACM

A decomposition storage model

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
A framework for research in database management for statistical analysis or a primer on statistical database management problems for computer scientists

SIGMOD '82 Proceedings of the 1982 ACM SIGMOD international conference on Management of data
Metadata Management for Large Statistical Databases

VLDB '82 Proceedings of the 8th International Conference on Very Large Data Bases
Statistical Databases: Characteristics, Problems, and some Solutions

VLDB '82 Proceedings of the 8th International Conference on Very Large Data Bases
An Analytic Approach to Statistical Databases

VLDB '83 Proceedings of the 9th International Conference on Very Large Data Bases
Management and display of data analysis environments for large data sets

SSDBM'83 Proceedings of the 2nd international workshop on Proceedings of the Second International Workshop on Statistical Database Management
ALDS project: motivation, statistical database management issues, perspectives, and directions

SSDBM'83 Proceedings of the 2nd international workshop on Proceedings of the Second International Workshop on Statistical Database Management
How Baroque should a statistical database management system be?

SSDBM'83 Proceedings of the 2nd international workshop on Proceedings of the Second International Workshop on Statistical Database Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistical analysis of large data sets often involves an initial data editing and preparation phase to check the validity of individual data items, check for consistency among related data, correct erroneous data, and supply (impute) values for missing data where possible. During this preparatory phase of analysis, it is often necessary to partition the data set into a number of subsets by logical selection and/or random sampling techniques for purposes of hypothesis testing. This paper examines the data management support required by these editing and subsetting operations in terms of data descriptions, data manipulation functions, and logical and physical data structures. The design of a data management system which seeks to meet these requirements is described in detail. The system, called SDB, is built around a self-describing transposed file structure and supporting data access software. SDB representations of some logical data structures which are commonly encountered in statistical databases are also described. Experiences with a partial implementation of the system and its application in an interactive data editor have been encouraging.