Incremental Exemplar Learning Schemes for Classification on Embedded Devices

  • Authors:
  • Ankur Jain;Daniel Nikovski

  • Affiliations:
  • Mitsubishi Electric Research Laboratory, Cambridge, MA 02139;Mitsubishi Electric Research Laboratory, Cambridge, MA 02139

  • Venue:
  • ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we focus on the data classification problem when the classifier operates on an embedded device (e.g., fault detection in device condition-monitoring data streams). Memory-based classifiers are an excellent choice in such cases, however, an embedded device is unlikely to be able to hold a large training dataset in memory (which could potentially keep increasing in size as new training data with new concepts arrive). A viable option then is to employ exemplar learning (EL) techniques to find a training subset comprising a few carefully selected exemplarsof high functional value that fit in memory and effectively delineate the class boundaries. We propose two novel incremental EL schemes that unlike traditional EL approaches [3] are, (1) incremental (they naturally incorporate new training data streams), (2) offer ordered removal of instances (they can be customized to obtain exemplar sets of any user-defined size) and (3) robust (such that the exemplar sets generalize for other classifiers as well). Our proposed methods are as follows:EBEL(Entropy Based EL) --- This method removes instances from the training set based on their information content. Instead of using an adhoc ranking scheme, it removes a training instance whose removal causes the least amount of drop in the conditional entropy of the class indicator variable insuring minimum loss of information.ABEL(AUC Based EL) --- This method prunes data based on AUC (Area under ROC curve) performance. ABELuses a validation setand prunes an instance if its removal offers the least drop in the AUC computed for this validation set.We show that our schemes efficiently incorporate new training datasets while maintaining high-quality exemplar sets of any user-defined size. We present a comprehensive experimental analysis showing excellent classification-accuracy versus memory-usage tradeoffs of our proposed methods.