The Measurement of Distinguishing Ability of Classification in Data Mining Model and Its Statistical Significance

Authors:
Lingling Zhang;Qingxi Wang;Jie Wei;Xiao Wang;Yong Shi
Affiliations:
Graduate University of Chinese Academy of Sciences, Beijing, China (100190) and Research Centre on Fictitious Economy and Data Science, CAS, Beijing, China (100190);Graduate University of Chinese Academy of Sciences, Beijing, China (100190);Graduate University of Chinese Academy of Sciences, Beijing, China (100190);Graduate University of Chinese Academy of Sciences, Beijing, China (100190);Research Centre on Fictitious Economy and Data Science, CAS, Beijing, China (100190) and College of Information Science and Technology, University of Nebraska at Omaha, Omaha, USA NE 68118
Venue:
ICCS 2009 Proceedings of the 9th International Conference on Computational Science
Year:
2009

Citing 4
Cited 0

Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
The New Organizational Wealth: Managing and Measuring Knowledge-Based Assets

The New Organizational Wealth: Managing and Measuring Knowledge-Based Assets
Knowledge refinement based on the discovery of unexpected patterns in data mining

Decision Support Systems - Special issue: Formal modeling and electronic commerce
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In order to test to what extent can data mining distinguish from observation points of different types, the indicators that can measure the difference between the distribution of positive and negative point scores are raised. First of all, we use the overlapping area of two types of point distributions-overlapping degree, to describe the difference, and discuss the nature of overlapping degree. Secondly, we put forward the image and quantitative indicators with the ability to distinguish different models: Lorenz curve, Gini coefficient, AR, as well as the similar ROC curve and AUC. We have proved AUC and AR are completely linear related; Finally, we construct the nonparametric statistics of AUC, however, the difference of K-S is that we cannot draw the conclusion that zero assumption is more difficult to be rejected when negative points take up a smaller proportion.