Introduction to combinators and &lgr;-calculus
Introduction to combinators and &lgr;-calculus
A practical introduction to denotational semantics
A practical introduction to denotational semantics
The CHRIS consultant: a tool for database design and rapid prototyping
Information Systems
Fundamentals of software engineering
Fundamentals of software engineering
A denotational semantics for the Starburst production rule language
ACM SIGMOD Record
The power of sampling in knowledge discovery
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Quickly generating billion-record synthetic databases
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Modelling test data for performance evaluation of large parallel database machines
Distributed and Parallel Databases
Integrity constraints: semantics and applications
Logics for databases and information systems
An introduction to database systems (7th ed.)
An introduction to database systems (7th ed.)
Small Armstrong relations for database design
PODS '85 Proceedings of the fourth ACM SIGACT-SIGMOD symposium on Principles of database systems
Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory
Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory
Elements of the Theory of Computation
Elements of the Theory of Computation
Information Systems Development: Methodologies, Techniques, and Tools
Information Systems Development: Methodologies, Techniques, and Tools
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
A scalable hash ripple join algorithm
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Exploiting statistics on query expressions for optimization
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Generating consistent test data: restricting the search space by a generator formula
The VLDB Journal — The International Journal on Very Large Data Bases
Legacy Information Systems: Issues and Directions
IEEE Software
A Framework for Analysis of Data Quality Research
IEEE Transactions on Knowledge and Data Engineering
Enhancing the Quality of Conceptual Database Specifications through Validation
ER '93 Proceedings of the 12th International Conference on the Entity-Relationship Approach: Entity-Relationship Approach
Building Consistent Sample Databases to Support Information System Evolution and Migration
DEXA '98 Proceedings of the 9th International Conference on Database and Expert Systems Applications
Consistent database sampling as a database prototyping approach
Journal of Software Maintenance: Research and Practice
Towards realistic sampling: generating dependencies in a relational database
Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Hi-index | 0.00 |
Database sampling is commonly used in applications like data mining and approximate query evaluation in order to achieve a compromise between the accuracy of the results and the computational cost of the process. The authors have recently proposed the use of database sampling in the context of populating a prototype database, that is, a database used to support the development of data-intensive applications. Existing methods for constructing prototype databases commonly populate the resulting database with synthetic data values. A more realistic approach is to sample a database so that the resulting sample satisfies a predefined set of integrity constraints. The resulting database, with domain-relevant data values and semantics, is expected to better support the software development process. This paper presents a formal study of database sampling. A Denotational Semantics description of database sampling is first discussed. Then the paper characterises the types of integrity constraints that must be considered during sampling. Lastly, the sampling strategy presented here is applied to improve the data quality of a (legacy) database. In this context, database sampling is used to incrementally identify the set of tuples which are the cause of inconsistencies in the database, and therefore should be the ones to be addressed by the data cleaning process.