The Measurement of Distinguishing Ability of Classification in Data Mining Model and Its Statistical Significance

  • Authors:
  • Lingling Zhang;Qingxi Wang;Jie Wei;Xiao Wang;Yong Shi

  • Affiliations:
  • Graduate University of Chinese Academy of Sciences, Beijing, China (100190) and Research Centre on Fictitious Economy and Data Science, CAS, Beijing, China (100190);Graduate University of Chinese Academy of Sciences, Beijing, China (100190);Graduate University of Chinese Academy of Sciences, Beijing, China (100190);Graduate University of Chinese Academy of Sciences, Beijing, China (100190);Research Centre on Fictitious Economy and Data Science, CAS, Beijing, China (100190) and College of Information Science and Technology, University of Nebraska at Omaha, Omaha, USA NE 68118

  • Venue:
  • ICCS 2009 Proceedings of the 9th International Conference on Computational Science
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In order to test to what extent can data mining distinguish from observation points of different types, the indicators that can measure the difference between the distribution of positive and negative point scores are raised. First of all, we use the overlapping area of two types of point distributions-overlapping degree, to describe the difference, and discuss the nature of overlapping degree. Secondly, we put forward the image and quantitative indicators with the ability to distinguish different models: Lorenz curve, Gini coefficient, AR, as well as the similar ROC curve and AUC. We have proved AUC and AR are completely linear related; Finally, we construct the nonparametric statistics of AUC, however, the difference of K-S is that we cannot draw the conclusion that zero assumption is more difficult to be rejected when negative points take up a smaller proportion.