Quality, trust, and utility of scientific data on the web: towards a joint model

Authors:
Matthew Gamble;Carole Goble
Affiliations:
University of Manchester;University of Manchester
Venue:
Proceedings of the 3rd International Web Science Conference
Year:
2011

Citing 17
Cited 0

A product perspective on total data quality management

Communications of the ACM
Data quality assessment

Communications of the ACM - Supporting community and building social capital
Web Wisdom; How to Evaluate and Create Information Quality on the Webb

Web Wisdom; How to Evaluate and Create Information Quality on the Webb
AIMQ: a methodology for information quality assessment

Information and Management
Design and Analysis of Quality Information for Data Warehouses

ER '98 Proceedings of the 17th International Conference on Conceptual Modeling
Building large-scale Bayesian networks

The Knowledge Engineering Review
The DaQuinCIS architecture: a platform for exchanging and improving data quality in cooperative information systems

Information Systems - Special issue: Data quality in cooperative information systems
Making quality count in biological data sources

Proceedings of the 2nd international workshop on Information quality in information systems
Computing and applying trust in web-based social networks

Computing and applying trust in web-based social networks
Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)

Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications)
Beyond accuracy: what data quality means to data consumers

Journal of Management Information Systems
Towards content trust of web resources

Web Semantics: Science, Services and Agents on the World Wide Web
Trust in digital information

Journal of the American Society for Information Science and Technology
The Open Provenance Model: An Overview

Provenance and Annotation of Data and Processes
The design and realisation of the Experimentmy Virtual Research Environment for social sharing of workflows

Future Generation Computer Systems
Methodologies for data quality assessment and improvement

ACM Computing Surveys (CSUR)
Quality-driven query answering for integrated information systems

Quality-driven query answering for integrated information systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In science, quality is paramount. As scientists increasingly look to the Web to share and discover scientific data, there is a growing need to support the scientist in assessing the quality of that data. However, quality is an ambiguous and overloaded term. In order to support the scientific user in discovering useful data we have systematically examined the nature of "quality" by exploiting three, prevalent properties of scientific data sets: (1) that data quality is commonly defined objectively; (2) the provenance and lineage in its production has a well understood role; and (3) "fitness-for-use" is a definition of utility rather than quality or trust, where the quality and trust-worthiness of the data and the entities that produced that data inform its utility. Our study is presented in two stages. First we review existing information quality dimensions and detail an assessment-oriented classification. We introduce definitions for quality, trust and utility in terms of the entities required in their assessment; producer, provider, consumer, process, artifact and quality standard. Next we detail a novel and experimental approach to assessment by modelling the causal relationships between quality, trust, and utility dimensions through the construction of decision networks informed by provenance graphs. To ground and motivate our discussion throughout we draw on the European Bioinformatics Institute's Gene Ontology Annotations database. We present an initial demonstration of our approach with an example for ranking results from the Gene Ontology Annotation database using an emerging objective quality measure, the Gene Ontology Annotation Quality score.