An Incremental Knowledge Acquisition Method for Improving Duplicate Invoices Detection

  • Authors:
  • Van Hai Ho;Paul Compton;Boualem Benatallah;Julien Vayssière;Lucio Menzel;Hartmut Vogler

  • Affiliations:
  • -;-;-;-;-;-

  • Venue:
  • ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Duplicate records are a major problem and duplicate invoices are a specific example of this. The detection of duplicate invoices is a critical issue for business since duplicate invoices can result in a company paying more than once for goods or services ordered. Past experience has shown that generic duplicate record detection techniques are not very useful when applied to invoices: the rate of false positives can be so high that invoice clerks are discouraged from using the system. This is because such approaches do not take the business context into account, e.g. what types of good were ordered as well as the past relationship with that vendor. In this paper, we discuss applying Ripple Down Rules (RDR), an approach for incremental and end-user-centred knowledge acquisition, to the problem of classifying pairs of potential duplicate invoices. We describe how we built a prototype on top of the SAP ERP product and evaluated it on a real data set that had been previously independently audited for duplicates. The preliminary results have highlighted the significant potential of this approach for assisting invoicing clerks processing potential duplicate invoices. We have observed a drop in the rate of false positives from 92% down to 18.66% when compared to traditional approaches that do not take the business context into account. We suggest that incremental development of domain specific knowledge may have more general application to the problem of handling duplicate records.