Detecting Errors in Foreign Trade Transactions: Dealing with Insufficient Data

Authors:
Luis Torgo;Welma Pereira;Carlos Soares
Affiliations:
LIAAD-INESC Porto, Univ. of Porto, Porto, Portugal 4050-190 and Faculdade de Ciências, University of Porto,;LIAAD-INESC Porto, Univ. of Porto, Porto, Portugal 4050-190;LIAAD-INESC Porto, Univ. of Porto, Porto, Portugal 4050-190 and Faculdade de Economia, University of Porto,
Venue:
EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Year:
2009

Citing 6
Cited 0

LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
OPTICS-OF: Identifying Local Outliers

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
A Survey of Outlier Detection Methodologies

Artificial Intelligence Review
Resource-bounded fraud detection

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Resource-bounded Outlier Detection using Clustering Methods

Proceedings of the 2010 conference on Data Mining for Business Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a data mining approach to the problem of detecting erroneous foreign trade transactions in data collected by the Portuguese Institute of Statistics (INE). Erroneous transactions are a minority, but still they have an important impact on the official statistics produced by INE. Detecting these rare errors is a manual, time-consuming task, which is constrained by a limited amount of available resources (e.g. financial, human). These constraints are common to many other data analysis problems (e.g. fraud detection). Our previous work addresses this issue by producing a ranking of outlyingness that allows a better management of the available resources by allocating them to the most relevant cases. It is based on an adaptation of hierarchical clustering methods for outlier detection. However, the method cannot be applied to articles with a small number of transactions. In this paper, we complement the previous approach with some standard statistical methods for outlier detection for handling articles with few transactions. Our experiments clearly show its advantages in terms of the criteria outlined by INE for considering any method applicable to this business problem. The generality of the approach remains to be tested in other problems which share the same constraints (e.g. fraud detection).