Camouflaged fraud detection in domains with complex relationships

  • Authors:
  • Sankar Virdhagriswaran;Gordon Dakin

  • Affiliations:
  • Xerox Labs, Webster, NY;Aspen Technologies, Cambridge, MA

  • Venue:
  • Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a data mining system to detect frauds that are camouflaged to look like normal activities in domains with high number of known relationships. Examples include accounting fraud detection for rating and investment, insider attacks on corporate networks, and health care insurance fraud. Our goal is to help analysts who are overwhelmed with information about companies or on-line system access logs or insurance claims to focus their attentions on features that cause damage in the future. We focused on accounting fraud where the task is to detect the subset of companies that were potentially committing accounting fraud within the total population of public companies that file quarterly and annual filings with the Securities and Exchange Commission (SEC). Using (a) Representation of changes, (b) A mix of decision tree learning, locally weighted logistic regression, k-means clustering, and constant regression in a two phase pipe line, we developed models that rank companies based on the probability of forecasting future damaging performance. The learned models were tested extensively over four years with public data available from SEC filings and private data available from rating companies and investment firms. Cross validation experiments and analyst based validation of private experiments were found to show that the approach performed as well as or better than domain experts and discovered new relationships that domain experts did not use on a regular basis. Finally, the detections preceded public knowledge of such problems by six to eighteen months.