On the relationship between the information measures and the bayes probability of error
IEEE Transactions on Information Theory
An application of the principle of maximum information preservation to linear systems
Advances in neural information processing systems 1
Evaluating and optimizing autonomous text classification systems
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Understanding Probabilistic Classifiers
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
An introduction to variable and feature selection
The Journal of Machine Learning Research
Feature extraction by non parametric mutual information maximization
The Journal of Machine Learning Research
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Lower and Upper Bounds for Misclassification Probability Based on Renyi's Information
Journal of VLSI Signal Processing Systems
Convex Optimization
Data mining in metric space: an empirical analysis of supervised learning performance criteria
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Feature Extraction Using Information-Theoretic Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
On the Consistency of Multiclass Classification Methods
The Journal of Machine Learning Research
Introduction to Information Retrieval
Introduction to Information Retrieval
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
The Journal of Machine Learning Research
Uncertainty and the probability of error (Corresp.)
IEEE Transactions on Information Theory
Probability of error, equivocation, and the Chernoff bound
IEEE Transactions on Information Theory
Relations between entropy and error probability
IEEE Transactions on Information Theory
Hi-index | 0.00 |
Fano's inequality lower bounds the probability of transmission error through a communication channel. Applied to classification problems, it provides a lower bound on the Bayes error rate and motivates the widely used Infomax principle. In modern machine learning, we are often interested in more than just the error rate. In medical diagnosis, different errors incur different cost; hence, the overall risk is cost-sensitive. Two other popular criteria are balanced error rate (BER) and F-score. In this work, we focus on the two-class problem and use a general definition of conditional entropy (including Shannon's as a special case) to derive upper/lower bounds on the optimal F-score, BER and cost-sensitive risk, extending Fano's result. As a consequence, we show that Infomax is not suitable for optimizing F-score or cost-sensitive risk, in that it can potentially lead to low F-score and high risk. For cost-sensitive risk, we propose a new conditional entropy formulation which avoids this inconsistency. In addition, we consider the common practice of using a threshold on the posterior probability to tune performance of a classifier. As is widely known, a threshold of 0.5, where the posteriors cross, minimizes error rate--we derive similar optimal thresholds for F-score and BER.