Approximated measures in construction of decision trees from large databases

Authors:
Hung Son Nguyen;Sinh Hoa Nguyen
Affiliations:
Institute of Mathematics, Warsaw University, Banacha 2, Warsaw 02-097, Poland;Polish-Japanese Institute of Information Technology Koszykowa 86, 02-008, Warszawa, Poland
Venue:
Design and application of hybrid intelligent systems
Year:
2003

Citing 10
Cited 0

On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Rough Sets in Knowledge Discovery 2: Applications, Case Studies, and Software Systems

Rough Sets in Knowledge Discovery 2: Applications, Case Studies, and Software Systems
Processing large-scale multi-dimensional data in parallel and distributed environments

Parallel Computing - Parallel data-intensive algorithms and applications
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Efficient SQL-Querying Method for Data Mining in Large Data Bases

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
On Efficient Handling of Continuous Attributes in Large Data Bases

Fundamenta Informaticae

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an efficient method for decision tree construction from large data set, which is assumed to be stored in some database server, and to be accessible by SQL queries. We develop a decision tree construction method, which minimizes the total time of data transmission between client and server. Our method, based on divide and conqurer search strategy, minimizes the number of simple queries necessary to search for the best cuts. To make it possible, we develop some, approximate measures, defined on intervals of attribute values, to evaluate the chance that the best cut belongs to the given interval. We propose some applications of the presented approach in discretization and construction of soft decision tree, which is a novel classifier model.