Instance-Based Learning Algorithms
Machine Learning
C4.5: programs for machine learning
C4.5: programs for machine learning
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Complexity Measures of Supervised Classification Problems
IEEE Transactions on Pattern Analysis and Machine Intelligence
On Classifier Domains of Competence
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Data Complexity in Pattern Recognition (Advanced Information and Knowledge Processing)
Data Complexity in Pattern Recognition (Advanced Information and Knowledge Processing)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
On dataset complexity for case base maintenance
ICCBR'11 Proceedings of the 19th international conference on Case-Based Reasoning Research and Development
Towards UCI+: A mindful repository design
Information Sciences: an International Journal
Hi-index | 0.00 |
This paper deals with the characterization of data complexity and the relationship with the classification accuracy. We study three dimensions of data complexity: the length of the class boundary, the number of features, and the number of instances of the data set. We find that the length of the class boundary is the most relevant dimension of complexity, since it can be used as an estimate of the maximum achievable accuracy rate of a classifier. The number of attributes and the number of instances do not affect classifier accuracy by themselves, if the boundary length is kept constant. The study emphasizes the use of measures revealing the intrinsic structure of data and recommends their use to extract conclusions on classifier behavior and their relative performance in multiple comparison experiments.