Incorporating Domain-Specific Information Quality Constraints into Database Queries

Authors:
Suzanne M. Embury;Paolo Missier;Sandra Sampaio;R. Mark Greenwood;Alun D. Preece
Affiliations:
University of Manchester;University of Manchester;University of Manchester;University of Manchester;Cardiff University
Venue:
Journal of Data and Information Quality (JDIQ)
Year:
2009

Citing 27
Cited 2

A product perspective on total data quality management

Communications of the ACM
The impact of poor data quality on the typical enterprise

Communications of the ACM
Data quality assessment

Communications of the ACM - Supporting community and building social capital
Data Quality for the Information Age

Data Quality for the Information Age
AIMQ: a methodology for information quality assessment

Information and Management
Quality-driven Integration of Heterogenous Information Systems

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Potter's Wheel: An Interactive Data Cleaning System

Proceedings of the 27th International Conference on Very Large Data Bases
A Retrospective on Industrial Database Reverse Engineering Projects-Part 1

WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Exploratory Data Mining and Data Cleaning

Exploratory Data Mining and Data Cleaning
Composing Web services on the Semantic Web

The VLDB Journal — The International Journal on Very Large Data Bases
A framework for analysis of data freshness

Proceedings of the 2004 international workshop on Information quality in information systems
Methods for evaluating and creating data quality

Information Systems - Special issue: Data quality in cooperative information systems
The DaQuinCIS architecture: a platform for exchanging and improving data quality in cooperative information systems

Information Systems - Special issue: Data quality in cooperative information systems
Making quality count in biological data sources

Proceedings of the 2nd international workshop on Information quality in information systems
Supporting Consumers by Characterizing the Quality of Online Health Information: A Multidimensional Framework

HICSS '06 Proceedings of the 39th Annual Hawaii International Conference on System Sciences - Volume 05
Towards a Quality Model for Effective Data Selection in Collaboratories

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Towards the Management of Information Quality in Proteomics

CBMS '06 Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems
Quality views: capturing and exploiting the user perspective on data quality

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)

Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)
Beyond accuracy: what data quality means to data consumers

Journal of Management Information Systems
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Checks and balances: monitoring data quality problems in network traffic databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Declarative XML data cleaning with XClean

CAiSE'07 Proceedings of the 19th international conference on Advanced information systems engineering
Accelerating disease gene identification through integrated SNP data analysis

DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
Quality-driven query answering for integrated information systems

Quality-driven query answering for integrated information systems
Managing information quality in e-science using semantic web technology

ESWC'06 Proceedings of the 3rd European conference on The Semantic Web: research and applications
Incorporating the timeliness quality dimension in internet query systems

WISE'05 Proceedings of the 2005 international conference on Web Information Systems Engineering

Data quality through model checking techniques

IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
Less is more: selecting sources wisely for integration

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

The range of information now available in queryable repositories opens up a host of possibilities for new and valuable forms of data analysis. Database query languages such as SQL and XQuery offer a concise and high-level means by which such analyses can be implemented, facilitating the extraction of relevant data subsets into either generic or bespoke data analysis environments. Unfortunately, the quality of data in these repositories is often highly variable. The data is still useful, but only if the consumer is aware of the data quality problems and can work around them. Standard query languages offer little support for this aspect of data management. In principle, however, it should be possible to embed constraints describing the consumer’s data quality requirements into the query directly, so that the query evaluator can take over responsibility for enforcing them during query processing. Most previous attempts to incorporate information quality constraints into database queries have been based around a small number of highly generic quality measures, which are defined and computed by the information provider. This is a useful approach in some application areas but, in practice, quality criteria are more commonly determined by the user of the information not by the provider. In this article, we explore an approach to incorporating quality constraints into database queries where the definition of quality is set by the user and not the provider of the information. Our approach is based around the concept of a quality view, a configurable quality assessment component into which domain-specific notions of quality can be embedded. We examine how quality views can be incorporated into XQuery, and draw from this the language features that are required in general to embed quality views into any query language. We also propose some syntactic sugar on top of XQuery to simplify the process of querying with quality constraints.