On Efficient Construction of Decision Trees from Large Databases

Authors:
Hung Son Nguyen
Affiliations:
-
Venue:
RSCTC '00 Revised Papers from the Second International Conference on Rough Sets and Current Trends in Computing
Year:
2000

Citing 8
Cited 2

On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
From optimal hyperplanes to optimal decision trees

Fundamenta Informaticae
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Rough Sets in Knowledge Discovery 2: Applications, Case Studies, and Software Systems

Rough Sets in Knowledge Discovery 2: Applications, Case Studies, and Software Systems
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient SQL-querying method for data mining in large data bases

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
The attribute selection problem in decision tree generation

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence

A tool for study of optimal decision trees

RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology
On Efficient Handling of Continuous Attributes in Large Data Bases

Fundamenta Informaticae

Quantified Score

Hi-index	0.00

Visualization

Abstract

The main taskin decision tree construction algorithms is to find the "best partition" of the set of objects. In this paper, we investigate the problem of optimal binary partition of continuous attribute for large data sets stored in relational databases. The critical for time complexity of algorithms solving this problem is the number of simple SQL queries necessary to construct such partitions. The straightforward approach to optimal partition selection needs at least O(N) queries, where N is the number of pre-assumed partitions of the searching space. We show some properties of optimization measures related to discernibility between objects, that allow to construct the partition very close to optimal using only O(log N) simple queries.