On changing continuous attributes into ordered discrete attributes
EWSL-91 Proceedings of the European working session on learning on Machine learning
C4.5: programs for machine learning
C4.5: programs for machine learning
From optimal hyperplanes to optimal decision trees
Fundamenta Informaticae
Rough Sets: Theoretical Aspects of Reasoning about Data
Rough Sets: Theoretical Aspects of Reasoning about Data
Rough Sets in Knowledge Discovery 2: Applications, Case Studies, and Software Systems
Rough Sets in Knowledge Discovery 2: Applications, Case Studies, and Software Systems
Use of Contextual Information for Feature Ranking and Discretization
IEEE Transactions on Knowledge and Data Engineering
Boolean Reasoning for Feature Extraction Problems
ISMIS '97 Proceedings of the 10th International Symposium on Foundations of Intelligent Systems
On Efficient Construction of Decision Trees from Large Databases
RSCTC '00 Revised Papers from the Second International Conference on Rough Sets and Current Trends in Computing
Chi2: Feature Selection and Discretization of Numeric Attributes
TAI '95 Proceedings of the Seventh International Conference on Tools with Artificial Intelligence
Efficient SQL-querying method for data mining in large data bases
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
The attribute selection problem in decision tree generation
AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
ChiMerge: discretization of numeric attributes
AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Approximated measures in construction of decision trees from large databases
Design and application of hybrid intelligent systems
Information Sciences: an International Journal
Hi-index | 0.00 |
Some data mining techniques, like discretization of continuous attributes or decision tree induction, are based on searching for an optimal partition of data with respect to some optimization criteria. We investigate the problem of searching for optimal binary partition of continuous attribute domain in case of large data sets stored in relational data bases (RDB). The critical for time complexity of algorithms solving this problem is the number of I/O database operations necessary to construct such partitions. In our approach the basic operators are defined by queries on the number of objects characterized by means of real value intervals of continuous attributes. We assume the answer time for such queries does not depend on the interval length. The straightforward approach to the optimal partition selection (with respect to a given measure) requires O(N) basic queries, where N is the number of preassumed partition parts in the searching space. We show properties of the basic optimization measures making possible to reduce the size of searching space. Moreover, we prove that using only O(\log N) simple queries, one can construct a partition very close to optimal.