Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Machine Learning
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
Data Mining and Knowledge Discovery
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Reducing multiclass to binary: a unifying approach for margin classifiers
The Journal of Machine Learning Research
Analytical response time estimation in parallel relational database systems
Parallel Computing
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
PQR: Predicting Query Execution Times for Autonomous Workload Management
ICAC '08 Proceedings of the 2008 International Conference on Autonomic Computing
A comparison of methods for multiclass support vector machines
IEEE Transactions on Neural Networks
Mining traffic incidents to forecast impact
Proceedings of the ACM SIGKDD International Workshop on Urban Computing
Hi-index | 0.00 |
Given a data set D, such that (X i ,y i ) *** D , y i *** ***, we are interested in first dividing the range of y i , i.e. (y max *** y min ), (where y max is the maximum of all y i corresponding to (X i ,y i ) *** D and y min is the minimum of all y i corresponding to (X i ,y i ) *** D ), into contiguous ranges which can be thought of as classes and then for a new point, X j , predicting which range (class) it falls into. The problem is difficult, because neither the size of each range nor the number of ranges, is known a-priori. This was a practical problem that arose when we wanted to predict the execution time of a query in a database. For our purposes, an accurate prediction was not required, while a time range was sufficient and the time ranges were unknown a-priori. To solve this problem we introduce a binary tree structure called Class Discovery Tree . We have used this technique successfully for predicting the execution times of a query and this is slated for incorporation into a commercial, enterprise level Database Management System. In this paper, we discuss our solution and validate it on two more real life data sets. In the first one, we compare our result with a naive approach and in the second, with the published results. In both cases, our approach is superior.