Methods for evaluating and creating data quality

Authors:
William E. Winkler
Affiliations:
US Bureau of the Census, Statistical Research, Room 3000-4, Washington DC
Venue:
Information Systems - Special issue: Data quality in cooperative information systems
Year:
2004

Citing 24
Cited 27

Optimal imputation of erroneous data: Categorical data, general edits

Operations Research
Handbook of record linkage: methods for health and statistical studies, administration, and business

Handbook of record linkage: methods for health and statistical studies, administration, and business
The nature of statistical learning theory

The nature of statistical learning theory
Probabilistic frame-based systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Improving data warehouse and business information quality: methods for reducing costs and increasing profits

Improving data warehouse and business information quality: methods for reducing costs and increasing profits
The string B-tree: a new data structure for string search in external memory and its applications

Journal of the ACM (JACM)
Foundations of Probabilistic and Utility-Theoretic Indexing

Journal of the ACM (JACM)
Term Weighting in Information Retrieval Using the Term Precision Model

Journal of the ACM (JACM)
Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Approximate String Matching

ACM Computing Surveys (CSUR)
Automatic spelling correction in scientific and scholarly text

Communications of the ACM
Enterprise knowledge management: the data quality approach

Enterprise knowledge management: the data quality approach
Record linkage: making maximum use of the discriminating power of identifying information

Communications of the ACM
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Learning object identification rules for information integration

Information Systems - Data extraction, cleaning and reconciliation
Machine Learning

Machine Learning
Data Quality for the Information Age

Data Quality for the Information Age
Assignment and Matching Problems: Solution Methods with FORTRAN-Programs

Assignment and Matching Problems: Solution Methods with FORTRAN-Programs
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Learning domain-independent string transformation weights for high accuracy object identification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Record Linkage in Large Data Sets

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
TAILOR: A Record Linkage Tool Box

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Eliminating fuzzy duplicates in data warehouses

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Data quality awareness: a case study for cost optimal association rule mining

Knowledge and Information Systems - Special Issue on Mining Low-Quality Data
Improving data quality: consistency and accuracy

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Dimensional issues in agricultural data warehouse designs

Computers and Electronics in Agriculture
Conditional functional dependencies for capturing data inconsistencies

ACM Transactions on Database Systems (TODS)
Dependencies revisited for improving data quality

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A two-step classification approach to unsupervised record linkage

AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Evaluation of a graduate level data mining course with industry participants

AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
A comprehensive data quality methodology for web and structured data

International Journal of Innovative Computing and Applications
Automatic record linkage using seeded nearest neighbour and support vector machine classification

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
An Applicable Data Quality Model for Web Portal Data Consumers

World Wide Web
A proposal for a set of attributes relevant for Web portal data quality

Software Quality Control
Incorporating Domain-Specific Information Quality Constraints into Database Queries

Journal of Data and Information Quality (JDIQ)
Similarity-aware indexing for real-time entity resolution

Proceedings of the 18th ACM conference on Information and knowledge management
Development and user experiences of an open source data cleaning, deduplication and record linkage system

ACM SIGKDD Explorations Newsletter
Reasoning about record matching rules

Proceedings of the VLDB Endowment
Beyond k-Anonymity: A Decision Theoretic Framework for Assessing Privacy Risk

Transactions on Data Privacy
Automatic training example selection for scalable unsupervised record linkage

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Consistent query answers in inconsistent probabilistic databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Social Network Analysis and Mining for Business Applications

ACM Transactions on Intelligent Systems and Technology (TIST)
Preventing human error: The impact of data entry methods on data accuracy and statistical results

Computers in Human Behavior
Dynamic constraints for record matching

The VLDB Journal — The International Journal on Very Large Data Bases
Cost-efficient repair in inconsistent probabilistic databases

Proceedings of the 20th ACM international conference on Information and knowledge management
Defining a data quality model for web portals

WISE'06 Proceedings of the 7th international conference on Web Information Systems
A first approach to a data quality model for web portals

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part III
Blog quality model

International Journal of Web Based Communities
A taxonomy of privacy-preserving record linkage techniques

Information Systems
Information quality measurement of medical encoding support based on usability

Computer Methods and Programs in Biomedicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper provides a survey of two classes of methods that can be used in determining and improving the quality of individual files or groups of files. The first are edit/imputation methods for maintaining business rules and for imputing for missing data. The second are methods of data cleaning for finding duplicates within files or across files.