Likelihood-based sampling from databases for rule induction methods

  • Authors:
  • Shusaku Tsumoto;Shoji Hirano;Hidenao Abe

  • Affiliations:
  • Department of Medical Informatics, Faculty of Medicine, Shimane University, Izumo, Japan;Department of Medical Informatics, Faculty of Medicine, Shimane University, Izumo, Japan;Department of Medical Informatics, Faculty of Medicine, Shimane University, Izumo, Japan

  • Venue:
  • RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces the idea of log-likelihood ratio to measure the similarity between generated training samples and original tracing samples. The ratio is used as a test statistic to determine whether the statistical information of generated training samples(Sk) is almost equivalent to that of original training samples(S0), denoted by S0 ≃ Sk. If the test statistic obtained rejects the hypothesis S0 ≃ Sk, then these samples are abandoned. Otherwise, the generated samples are accepted and rule induction methods or statistical methods are applied. This method was evaluated to three medical domains. The results show that the proposed method selects training samples which reflect the statistical characteristics of the original training samples although the performance with small samples is not so good.