Discretization numbers for multiple-instances problem in relational database

Authors:
Rayner Alfred;Dimitar Kazakov
Affiliations:
University of York, Computer Science Department, Heslington, York, United Kingdom;University of York, Computer Science Department, Heslington, York, United Kingdom
Venue:
ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
Year:
2007

Citing 13
Cited 4

C4.5: programs for machine learning

C4.5: programs for machine learning
Theories for mutagenicity: a study in first-order and feature-based induction

Artificial Intelligence - Special volume on empirical methods
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Active data clustering

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Clustering Algorithms

Clustering Algorithms
Propositionalization approaches to relational data mining

Relational Data Mining
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Principal Direction Divisive Partitioning

Data Mining and Knowledge Discovery
Multi-interval Discretization Methods for Decision Tree Learning

SSPR '98/SPR '98 Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition
On Multi-class Problems and Discretization in Inductive Logic Programming

ISMIS '97 Proceedings of the 10th International Symposium on Foundations of Intelligent Systems
Data summarization approach to relational domain learning based on frequent pattern to support the development of decision making

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Some new indexes of cluster validity

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Rules Extraction Based on Data Summarisation Approach Using DARA

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Discovering Knowledge from Multi-relational Data Based on Information Retrieval Theory

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Similarity-Based Classification in Relational Databases

Fundamenta Informaticae
Combining heterogeneous classifiers for relational databases

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Handling numerical data stored in a relational database is different from handling those numerical data stored in a single table due to the multiple occurrences of an individual record in the non-target table and nondeterminate relations between tables. Most traditional data mining methods only deal with a single table and discretize columns that contain continuous numbers into nominal values. In a relational database, multiple records with numerical attributes are stored separately from the target table, and these records are usually associated with a single structured individual stored in the target table. Numbers in multi-relational data mining (MRDM) are often discretized, after considering the schema of the relational database, in order to reduce the continuous domains to more manageable symbolic domains of low cardinality, and the loss of precision is assumed to be acceptable. In this paper, we consider different alternatives for dealing with continuous attributes in MRDM. The discretization procedures considered in this paper include algorithms that do not depend on the multi-relational structure of the data and also that are sensitive to this structure. In this experiment, we study the effects of taking the one-to-many association issue into consideration in the process of discretizing continuous numbers. We implement a new method of discretization, called the entropy-instance-based discretization method, and we evaluate this discretization method with respect to C4.5 on three varieties of a well-known multirelational database (Mutagenesis), where numeric attributes play an important role. We demonstrate on the empirical results obtained that entropy-based discretization can be improved by taking into consideration the multiple-instance problem.