RDR-based open IE for the web document
Proceedings of the sixth international conference on Knowledge capture
Similarity function recommender service using incremental user knowledge acquisition
ICSOC'11 Proceedings of the 9th international conference on Service-Oriented Computing
Detection of CAN by ensemble classifiers based on ripple down rules
PKAW'12 Proceedings of the 12th Pacific Rim conference on Knowledge Management and Acquisition for Intelligent Systems
Improving open information extraction for informal web documents with ripple-down rules
PKAW'12 Proceedings of the 12th Pacific Rim conference on Knowledge Management and Acquisition for Intelligent Systems
Improving the performance of a named entity recognition system with knowledge acquisition
EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
Hi-index | 0.00 |
Duplicate records are a major problem and duplicate invoices are a specific example of this. The detection of duplicate invoices is a critical issue for business since duplicate invoices can result in a company paying more than once for goods or services ordered. Past experience has shown that generic duplicate record detection techniques are not very useful when applied to invoices: the rate of false positives can be so high that invoice clerks are discouraged from using the system. This is because such approaches do not take the business context into account, e.g. what types of good were ordered as well as the past relationship with that vendor. In this paper, we discuss applying Ripple Down Rules (RDR), an approach for incremental and end-user-centred knowledge acquisition, to the problem of classifying pairs of potential duplicate invoices. We describe how we built a prototype on top of the SAP ERP product and evaluated it on a real data set that had been previously independently audited for duplicates. The preliminary results have highlighted the significant potential of this approach for assisting invoicing clerks processing potential duplicate invoices. We have observed a drop in the rate of false positives from 92% down to 18.66% when compared to traditional approaches that do not take the business context into account. We suggest that incremental development of domain specific knowledge may have more general application to the problem of handling duplicate records.