Camouflaged fraud detection in domains with complex relationships

Authors:
Sankar Virdhagriswaran;Gordon Dakin
Affiliations:
Xerox Labs, Webster, NY;Aspen Technologies, Cambridge, MA
Venue:
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2006

Citing 4
Cited 3

Omega: on-line memory-based general purpose system classifier

Omega: on-line memory-based general purpose system classifier
Algorithms for Spatial Outlier Detection

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining Deviants in Time Series Data Streams

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
On Characterization and Discovery of Minimal Unexpected Patterns in Rule Discovery

IEEE Transactions on Knowledge and Data Engineering

Exploring Fraudulent Financial Reporting with GHSOM

PAISI '09 Proceedings of the Pacific Asia Workshop on Intelligence and Security Informatics
Social Network Analysis and Mining for Business Applications

ACM Transactions on Intelligent Systems and Technology (TIST)
Metafraud: a meta-learning framework for detecting financial fraud

MIS Quarterly

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a data mining system to detect frauds that are camouflaged to look like normal activities in domains with high number of known relationships. Examples include accounting fraud detection for rating and investment, insider attacks on corporate networks, and health care insurance fraud. Our goal is to help analysts who are overwhelmed with information about companies or on-line system access logs or insurance claims to focus their attentions on features that cause damage in the future. We focused on accounting fraud where the task is to detect the subset of companies that were potentially committing accounting fraud within the total population of public companies that file quarterly and annual filings with the Securities and Exchange Commission (SEC). Using (a) Representation of changes, (b) A mix of decision tree learning, locally weighted logistic regression, k-means clustering, and constant regression in a two phase pipe line, we developed models that rank companies based on the probability of forecasting future damaging performance. The learned models were tested extensively over four years with public data available from SEC filings and private data available from rating companies and investment firms. Cross validation experiments and analyst based validation of private experiments were found to show that the approach performed as well as or better than domain experts and discovered new relationships that domain experts did not use on a regular basis. Finally, the detections preceded public knowledge of such problems by six to eighteen months.