When classification becomes a problem: using branch-and-bound to improve classification efficiency

  • Authors:
  • Armand Prieditis;Moontae Lee

  • Affiliations:
  • Neustar Labs, Mountain View, CA;Neustar Labs, Mountain View, CA

  • Venue:
  • MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In a typical machine learning classification task there are two phases: training and prediction. This paper focuses on improving the efficiency of the prediction phase. When the number of classes is low, linear search among the classes is an efficient way to find the most likely class. However, when the number of classes is high, linear search is inefficient. For example, some applications such as geolocation or time-based classification might require millions of subclasses to fit the data. Specifically, this paper describes a branch-and-bound method to search for the most likely class where the training examples can be partitioned into thousands of subclasses. To get some idea of the performance of branch-and-bound classification, we generated a synthetic set of random trees comprising billions of classes and evaluated branch-and-bound classification. Our results show that branch-and-bound classification is effective when the number of classes is large. Specifically, branch-and-bound improves search efficiency logarithmically.