Scalable techniques for the application of machine learning to large datasets

  • Authors:
  • Edward Y. Chang;Navneet Panda

  • Affiliations:
  • University of California, Santa Barbara;University of California, Santa Barbara

  • Venue:
  • Scalable techniques for the application of machine learning to large datasets
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Concept learning finds application in diverse domains like image retrieval, protein/gene classification and face recognition. Kernel based techniques like support vector machines (SVMs) have become increasingly popular due to their strong theoretical foundations and remarkable empirical performance over diverse data domains. The widespread application of SVMs on real-world datasets, however, has lagged due to issues arising from the scalability of the training and classification stages. In this thesis, we propose solutions for rapid concept-learning and retrieval of the top-k set of relevant instances. Our approach reduces the training time of SVMs in the multi-category scenario by intelligently pruning training instances. Our retrieval solution naturally extends to the active learning scenario where the most ambiguously labeled instances are of interest. We also develop an approach for speeding up nearest neighbor based classification which continues to be popular and is widely used for many real-world classification tasks.