Finding fraud in health insurance data with two-layer outlier detection approach

Authors:
Rob M. Konijn;Wojtek Kowalczyk
Affiliations:
Department of Computer Science, VU University Amsterdam;Department of Computer Science, VU University Amsterdam
Venue:
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Year:
2011

Citing 9
Cited 0

A fast algorithm for the minimum covariance determinant estimator

Technometrics
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Fast Outlier Detection in High Dimensional Spaces

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
A Survey of Outlier Detection Methodologies

Artificial Intelligence Review
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
A comprehensive survey of numeric and symbolic outlier mining techniques

Intelligent Data Analysis
LoOP: local outlier probabilities

Proceedings of the 18th ACM conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conventional techniques for detecting outliers address the problem of finding isolated observations that significantly differ from other observations that are stored in a database. For example, in the context of health insurance, one might be interested in finding unusual claims concerning prescribed medicines. Each claim record may contain information on the prescribed drug (its code), volume (e.g., the number of pills and their weight), dosing and the price. Finding outliers in such data can be used for identifying fraud. However, when searching for fraud, it is more important to analyse data not on the level of single records, but on the level of single patients, pharmacies or GP's. In this paper we present a novel approach for finding outliers in such hierarchical data. Our method uses standard techniques for measuring outlierness of single records and then aggregates these measurements to detect outliers in entities that are higher in the hierarchy. We applied this method to a set of about 40 million records from a health insurance company to identify suspicious pharmacies.