Communications of the ACM
Assessing data quality in accounting information systems
Communications of the ACM
Improving data warehouse and business information quality: methods for reducing costs and increasing profits
A knowledge-based approach for duplicate elimination in data cleaning
Information Systems - Data extraction, cleaning and reconciliation
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
An Instance-Weighting Method to Induce Cost-Sensitive Trees
IEEE Transactions on Knowledge and Data Engineering
Data Quality: The Accuracy Dimension
Data Quality: The Accuracy Dimension
Mining Generalized Association Rules
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Selecting the right objective measure for association analysis
Information Systems - Knowledge discovery and data mining (KDD 2002)
Interestingness measures for data mining: A survey
ACM Computing Surveys (CSUR)
An introduction to ROC analysis
Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Beyond accuracy: what data quality means to data consumers
Journal of Management Information Systems
A Procedure to Develop Metrics for Currency and its Application in CRM
Journal of Data and Information Quality (JDIQ)
A Framework for Reconciling Attribute Values from Multiple Data Sources
Management Science
Methodologies for data quality assessment and improvement
ACM Computing Surveys (CSUR)
Assessing data currency - a probabilistic approach
Journal of Information Science
Data Mining: Practical Machine Learning Tools and Techniques
Data Mining: Practical Machine Learning Tools and Techniques
Outlier detection in relational data: A case study in geographical information systems
Expert Systems with Applications: An International Journal
Hi-index | 12.05 |
Business rules are an effective way to control data quality. Business experts can directly enter the rules into appropriate software without error prone communication with programmers. However, not all business situations and possible data quality problems can be considered in advance. In situations where business rules have not been defined yet, patterns of data handling may arise in practice. We employ data mining to accounting transactions in order to discover such patterns. The discovered patterns are represented in form of association rules. Then, deviations from discovered patterns can be marked as potential data quality violations that need to be examined by humans. Data quality breaches can be expensive but manual examination of many transactions is also expensive. Therefore, the goal is to find a balance between marking too many and too few transactions as being potentially erroneous. We apply appropriate procedures to evaluate the classification accuracy of developed association rules and support the decision on the number of deviations to be manually examined based on economic principles.