Assessment of Classification Models with Small Amounts of Data

Authors:
Boštjan Brumen;Matjaž B. Jurič;Tatjana Welzer;Ivan Rozman;Hannu Jaakkola;Apostolos Papadopoulos
Affiliations:
Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova 17, Si-2000 Maribor, Slovenia, e-mail: bostjan.brumen@uni-mb.si;Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova 17, Si-2000 Maribor, Slovenia, e-mail: bostjan.brumen@uni-mb.si;Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova 17, Si-2000 Maribor, Slovenia, e-mail: bostjan.brumen@uni-mb.si;Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova 17, Si-2000 Maribor, Slovenia, e-mail: bostjan.brumen@uni-mb.si;Tampere University of Technology, Pori, PO BOX 300, Fi-28101 Pori, Finland;Department of Informatics, Aristotle University, PO BOX 451, Thessaloniki, GR-54124, Greece
Venue:
Informatica
Year:
2007

Citing 10
Cited 2

Empirical methods for artificial intelligence

Empirical methods for artificial intelligence
Handbook of mathematics (3rd ed.)

Handbook of mathematics (3rd ed.)
Discovering data mining: from concept to implementation

Discovering data mining: from concept to implementation
Efficient progressive sampling

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Data Mining Techniques: For Marketing, Sales, and Customer Support

Data Mining Techniques: For Marketing, Sales, and Customer Support
Protecting medical data for decision-making analyses

Journal of Medical Systems - Special issue: Computer-based medical systems
UMass/Hughes: description of the CIRCUS system used for MUC-5

MUC5 '93 Proceedings of the 5th conference on Message understanding
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)
An Algorithm for Protecting Knowledge Discovery Data

Informatica

On Stochastic Optimization and Statistical Learning in Reproducing Kernel Hilbert Spaces by Support Vector Machines (SVM)

Informatica
Ensemble-based regression analysis of multimodal medical data for osteopenia diagnosis

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the tasks of data mining is classification, which provides a mapping from attributes (observations) to pre-specified classes. Classification models are built by using underlying data. In principle, the models built with more data yield better results. However, the relationship between the available data and the performance is not well understood, except that the accuracy of a classification model has diminishing improvements as a function of data size. In this paper, we present an approach for an early assessment of the extracted knowledge (classification models) in the terms of performance (accuracy), based on the amount of data used. The assessment is based on the observation of the performance on smaller sample sizes. The solution is formally defined and used in an experiment. In experiments we show the correctness and utility of the approach.