Using social network knowledge for detecting spider constructions in social security fraud

  • Authors:
  • Véronique Van Vlasselaer;Jan Meskens;Dries Van Dromme;Bart Baesens

  • Affiliations:
  • Katholieke Universiteit Leuven, Leuven, Belgium;Research Center, Brussels, Belgium;Research Center, Brussels, Belgium;Katholieke Universiteit Leuven, Leuven, Belgium and University of Southampton, Highfield Southampton, United Kingdom

  • Venue:
  • Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

As social networks offer a vast amount of additional information to enrich standard learning algorithms, the most challenging part is extracting relevant information from networked data. Fraudulent behavior is imperceptibly concealed both in local and relational data, making it even harder to define useful input for prediction models. Starting from expert knowledge, this paper succeeds to efficiently incorporate social network effects to detect fraud for the Belgian governmental social security institution, and to improve the performance of traditional non-relational fraud prediction tasks. As there are many types of social security fraud, this paper concentrates on payment fraud, predicting which companies intentionally disobey their payment duties to the government. We introduce a new fraudulent structure, the so-called spider constructions, which can easily be translated in terms of social networks and included in the learning algorithms. Focusing on the egonet of each company, the proposed method can handle large scale networks. In order to face the skewed class distribution, the SMOTE approach is applied to rebalance the data. The models were trained on different timestamps and evaluated on varying time windows. Using techniques as Random Forest, logistic regression and Naive Bayes, this paper shows that the combined relational model improves the AUC score and the precision of the predictions in comparison to the base scenario where only local variables are used.