Examination and comparison of conflicting data in granulated datasets: Equal width interval vs. equal frequency interval

Authors:
Chienhsing Wu;Shu-Chen Kao;Koji Okuhara
Affiliations:
Department of Information Management, National University of Kaohsiung, 700 Kaohsiung University Rd., Nanzih District, Kaohsiung 811, Taiwan;Department of Information Management, Kun Shan University, 949 Dawan Rd., Yung-Kung District, Tainan 71003, Taiwan;Graduate School of Information Science and Technology, Osaka University, 1-1 Yamadaoka, Suita, Osaka 565-0871, Japan
Venue:
Information Sciences: an International Journal
Year:
2013

Citing 24
Cited 0

On changing continuous attributes into ordered discrete attributes

EWSL-91 Proceedings of the European working session on learning on Machine learning
Automated knowledge acquisition

Automated knowledge acquisition
The KDD process for extracting useful knowledge from volumes of data

Communications of the ACM
A database perspective on knowledge discovery

Communications of the ACM
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic

Fuzzy Sets and Systems - Special issue: fuzzy sets: where do we stand? Where do we go?
DBMiner: interactive mining of multiple-level knowledge in relational databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Data mining and KDD: promise and challenges

Future Generation Computer Systems - Special double issue on data mining
Introduction to S and S-Plus

Introduction to S and S-Plus
Rough Sets and Data Mining: Analysis of Imprecise Data

Rough Sets and Data Mining: Analysis of Imprecise Data
Data Mining and Machine Oriented Modeling: A Granular Computing Approach

Applied Intelligence
An Extension to SQL for Mining Association Rules

Data Mining and Knowledge Discovery
MSQL: A Query Language for Database Mining

Data Mining and Knowledge Discovery
Feature Selection via Discretization

IEEE Transactions on Knowledge and Data Engineering
Induction By Attribute Elimination

IEEE Transactions on Knowledge and Data Engineering
Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
Database Mining: A Performance Perspective

IEEE Transactions on Knowledge and Data Engineering
CAIM Discretization Algorithm

IEEE Transactions on Knowledge and Data Engineering
MODL: A Bayes optimal discretization method for continuous attributes

Machine Learning
A discretization algorithm based on Class-Attribute Contingency Coefficient

Information Sciences: an International Journal
Comparative Analysis of the Impact of Discretization on the Classification with Naïve Bayes and Semi-Naïve Bayes Classifiers

ICMLA '08 Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications
A new extension of fuzzy sets using rough sets: R-fuzzy sets

Information Sciences: an International Journal
MGRS: A multi-granulation rough set

Information Sciences: an International Journal
On Efficient Handling of Continuous Attributes in Large Data Bases

Fundamenta Informaticae

Quantified Score

Hi-index	0.07

Visualization

Abstract

Knowledge discovery from databases requires comprehensive pre-examination to ensure that granulated datasets are consistent for continuous database conversion. Different granulation techniques may produce different results in the number of conflicting data in a granulated dataset. This work examines and compares the performance of equal width interval (EWI) and equal frequency interval (EFI), two granulation techniques. This work also explores the relationship between granulation performance and dataset size, number of attributes, and number of classes. Eighteen continuous datasets are examined. Experimental results indicate that (1) of the 18 datasets examined, 7 contained conflicting data by EWI and 8 by EFI, suggesting that almost 40% of the granulated datasets contained conflicting data; (2) almost 22% of the datasets had more than 20% conflicting data; (3) comparatively, no notable difference existed between EWI and EFI with respect to their granulation performance; (4) the production of conflicting data by EWI and EFI when compared against dataset size and number of classes was not remarkably different; and (5) more than 12 attributes will reduce the number of conflicting data by both EWI and EFI.