Classification with Unknown Classes

Authors:
Chetan Gupta;Song Wang;Umeshwar Dayal;Abhay Mehta
Affiliations:
Hewlett-Packard Labs,;Hewlett-Packard Labs,;Hewlett-Packard Labs,;Hewlett-Packard Labs,
Venue:
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Year:
2009

Citing 9
Cited 1

Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Multivariate Decision Trees

Machine Learning
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey

Data Mining and Knowledge Discovery
Optimal Histograms with Quality Guarantees

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Reducing multiclass to binary: a unifying approach for margin classifiers

The Journal of Machine Learning Research
Analytical response time estimation in parallel relational database systems

Parallel Computing
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
PQR: Predicting Query Execution Times for Autonomous Workload Management

ICAC '08 Proceedings of the 2008 International Conference on Autonomic Computing
A comparison of methods for multiclass support vector machines

IEEE Transactions on Neural Networks

Mining traffic incidents to forecast impact

Proceedings of the ACM SIGKDD International Workshop on Urban Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a data set D, such that (X i ,y i ) *** D , y i *** ***, we are interested in first dividing the range of y i , i.e. (y max *** y min ), (where y max is the maximum of all y i corresponding to (X i ,y i ) *** D and y min is the minimum of all y i corresponding to (X i ,y i ) *** D ), into contiguous ranges which can be thought of as classes and then for a new point, X j , predicting which range (class) it falls into. The problem is difficult, because neither the size of each range nor the number of ranges, is known a-priori. This was a practical problem that arose when we wanted to predict the execution time of a query in a database. For our purposes, an accurate prediction was not required, while a time range was sufficient and the time ranges were unknown a-priori. To solve this problem we introduce a binary tree structure called Class Discovery Tree . We have used this technique successfully for predicting the execution times of a query and this is slated for incorporation into a commercial, enterprise level Database Management System. In this paper, we discuss our solution and validate it on two more real life data sets. In the first one, we compare our result with a naive approach and in the second, with the published results. In both cases, our approach is superior.