Classification with Unknown Classes

  • Authors:
  • Chetan Gupta;Song Wang;Umeshwar Dayal;Abhay Mehta

  • Affiliations:
  • Hewlett-Packard Labs,;Hewlett-Packard Labs,;Hewlett-Packard Labs,;Hewlett-Packard Labs,

  • Venue:
  • SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given a data set D, such that (X i ,y i ) *** D , y i *** ***, we are interested in first dividing the range of y i , i.e. (y max *** y min ), (where y max is the maximum of all y i corresponding to (X i ,y i ) *** D and y min is the minimum of all y i corresponding to (X i ,y i ) *** D ), into contiguous ranges which can be thought of as classes and then for a new point, X j , predicting which range (class) it falls into. The problem is difficult, because neither the size of each range nor the number of ranges, is known a-priori. This was a practical problem that arose when we wanted to predict the execution time of a query in a database. For our purposes, an accurate prediction was not required, while a time range was sufficient and the time ranges were unknown a-priori. To solve this problem we introduce a binary tree structure called Class Discovery Tree . We have used this technique successfully for predicting the execution times of a query and this is slated for incorporation into a commercial, enterprise level Database Management System. In this paper, we discuss our solution and validate it on two more real life data sets. In the first one, we compare our result with a naive approach and in the second, with the published results. In both cases, our approach is superior.