Examination and comparison of conflicting data in granulated datasets: Equal width interval vs. equal frequency interval

  • Authors:
  • Chienhsing Wu;Shu-Chen Kao;Koji Okuhara

  • Affiliations:
  • Department of Information Management, National University of Kaohsiung, 700 Kaohsiung University Rd., Nanzih District, Kaohsiung 811, Taiwan;Department of Information Management, Kun Shan University, 949 Dawan Rd., Yung-Kung District, Tainan 71003, Taiwan;Graduate School of Information Science and Technology, Osaka University, 1-1 Yamadaoka, Suita, Osaka 565-0871, Japan

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 0.07

Visualization

Abstract

Knowledge discovery from databases requires comprehensive pre-examination to ensure that granulated datasets are consistent for continuous database conversion. Different granulation techniques may produce different results in the number of conflicting data in a granulated dataset. This work examines and compares the performance of equal width interval (EWI) and equal frequency interval (EFI), two granulation techniques. This work also explores the relationship between granulation performance and dataset size, number of attributes, and number of classes. Eighteen continuous datasets are examined. Experimental results indicate that (1) of the 18 datasets examined, 7 contained conflicting data by EWI and 8 by EFI, suggesting that almost 40% of the granulated datasets contained conflicting data; (2) almost 22% of the datasets had more than 20% conflicting data; (3) comparatively, no notable difference existed between EWI and EFI with respect to their granulation performance; (4) the production of conflicting data by EWI and EFI when compared against dataset size and number of classes was not remarkably different; and (5) more than 12 attributes will reduce the number of conflicting data by both EWI and EFI.