C4.5: programs for machine learning
C4.5: programs for machine learning
Theories for mutagenicity: a study in first-order and feature-based induction
Artificial Intelligence - Special volume on empirical methods
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Clustering Algorithms
Propositionalization approaches to relational data mining
Relational Data Mining
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Principal Direction Divisive Partitioning
Data Mining and Knowledge Discovery
Multi-interval Discretization Methods for Decision Tree Learning
SSPR '98/SPR '98 Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition
On Multi-class Problems and Discretization in Inductive Logic Programming
ISMIS '97 Proceedings of the 10th International Symposium on Foundations of Intelligent Systems
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Some new indexes of cluster validity
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Rules Extraction Based on Data Summarisation Approach Using DARA
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Discovering Knowledge from Multi-relational Data Based on Information Retrieval Theory
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Similarity-Based Classification in Relational Databases
Fundamenta Informaticae
Combining heterogeneous classifiers for relational databases
Pattern Recognition
Hi-index | 0.00 |
Handling numerical data stored in a relational database is different from handling those numerical data stored in a single table due to the multiple occurrences of an individual record in the non-target table and nondeterminate relations between tables. Most traditional data mining methods only deal with a single table and discretize columns that contain continuous numbers into nominal values. In a relational database, multiple records with numerical attributes are stored separately from the target table, and these records are usually associated with a single structured individual stored in the target table. Numbers in multi-relational data mining (MRDM) are often discretized, after considering the schema of the relational database, in order to reduce the continuous domains to more manageable symbolic domains of low cardinality, and the loss of precision is assumed to be acceptable. In this paper, we consider different alternatives for dealing with continuous attributes in MRDM. The discretization procedures considered in this paper include algorithms that do not depend on the multi-relational structure of the data and also that are sensitive to this structure. In this experiment, we study the effects of taking the one-to-many association issue into consideration in the process of discretizing continuous numbers. We implement a new method of discretization, called the entropy-instance-based discretization method, and we evaluate this discretization method with respect to C4.5 on three varieties of a well-known multirelational database (Mutagenesis), where numeric attributes play an important role. We demonstrate on the empirical results obtained that entropy-based discretization can be improved by taking into consideration the multiple-instance problem.