On Efficient Handling of Continuous Attributes in Large Data Bases

Authors:
Hung Son Nguyen
Affiliations:
Institute of Mathematics, Warsaw University, ul. Banacha 2, 02-097, Warsaw, Poland
Venue:
Fundamenta Informaticae
Year:
2001

Citing 10
Cited 11

On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
From optimal hyperplanes to optimal decision trees

Fundamenta Informaticae
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Rough Sets in Knowledge Discovery 2: Applications, Case Studies, and Software Systems

Rough Sets in Knowledge Discovery 2: Applications, Case Studies, and Software Systems
Use of Contextual Information for Feature Ranking and Discretization

IEEE Transactions on Knowledge and Data Engineering
On Changing Continuous Attributes into Ordered Discrete Attributes

EWSL '91 Proceedings of the European Working Session on Machine Learning
Efficient SQL-Querying Method for Data Mining in Large Data Bases

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Boolean Reasoning for Feature Extraction Problems

ISMIS '97 Proceedings of the 10th International Symposium on Foundations of Intelligent Systems
Chi2: Feature Selection and Discretization of Numeric Attributes

TAI '95 Proceedings of the Seventh International Conference on Tools with Artificial Intelligence

Fast split selection method and its application in decision tree construction from large databases

International Journal of Hybrid Intelligent Systems - Hybrid Intelligence using rough sets
Degrees of conditional (in)dependence: A framework for approximate Bayesian networks and examples related to the rough set-based feature selection

Information Sciences: an International Journal
Similarity Relation in Classification Problems

RSCTC '08 Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing
Rule-Based Similarity for Classification

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Discovering rules-based similarity in microarray data

IPMU'10 Proceedings of the Computational intelligence for knowledge-based systems design, and 13th international conference on Information processing and management of uncertainty
Applications of approximate reducts to the feature selection problem

RSKT'11 Proceedings of the 6th international conference on Rough sets and knowledge technology
A new method for discretization of continuous attributes based on VPRS

RSCTC'06 Proceedings of the 5th international conference on Rough Sets and Current Trends in Computing
Approximate boolean reasoning approach to rough sets and data mining

RSFDGrC'05 Proceedings of the 10th international conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing - Volume Part II
Time complexity of decision trees

Transactions on Rough Sets III
Approximate boolean reasoning: foundations and applications in data mining

Transactions on Rough Sets V
Dynamic rule-based similarity model for DNA microarray data

Transactions on Rough Sets XV

Quantified Score

Hi-index	0.00

Visualization

Abstract

Some data mining techniques, like discretization of continuous attributes or decision tree induction, are based on searching for an optimal partition of data with respect to some optimization criteria. We investigate the problem of searching for optimal binary partition of continuous attribute domain in case of large data sets stored in relational data bases (RDB). The critical for time complexity of algorithms solving this problem is the number of I/O database operations necessary to construct such partitions. In our approach the basic operators are defined by queries on the number of objects characterized by means of real value intervals of continuous attributes. We assume the answer time for such queries does not depend on the interval length. The straightforward approach to the optimal partition selection (with respect to a given measure) requires O(N) basic queries, where N is the number of preassumed partition parts in the searching space. We show properties of the basic optimization measures making possible to reduce the size of searching space. Moreover, we prove that using only O(log N) simple queries, one can construct a partition very close to optimal.