Evaluation Methods for Ordinal Classification

Authors:
Lisa Gaudette;Nathalie Japkowicz
Affiliations:
School of Information Technology and Engineering, University of Ottawa,;School of Information Technology and Engineering, University of Ottawa,
Venue:
Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
Year:
2009

Citing 7
Cited 3

Measuring retrieval effectiveness based on user preference of documents

Journal of the American Society for Information Science
The Case against Accuracy Estimation for Comparing Induction Algorithms

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A practical SVM-based algorithm for ordinal regression in image retrieval

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
ROC analysis in ordinal regression learning

Pattern Recognition Letters
An experimental comparison of performance measures for classification

Pattern Recognition Letters

Compact features for sentiment analysis

Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Multi-agent based classification using argumentation from experience

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Multi-agent based classification using argumentation from experience

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Ordinal classification is a form of multi-class classification where there is an inherent ordering between the classes, but not a meaningful numeric difference between them. Little attention has been paid as to how to evaluate these problems, with many authors simply reporting accuracy, which does not account for the severity of the error. Several evaluation metrics are compared across a dataset for a problem of classifying user reviews, where the data is highly skewed towards the highest values. Mean squared error is found to be the best metric when we prefer more (smaller) errors overall to reduce the number of large errors, while mean absolute error is also a good metric if we instead prefer fewer errors overall with more tolerance for large errors.