C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Hi-index | 0.00 |
This paper introduces the idea of log-likelihood ratio to measure the similarity between generated training samples and original tracing samples. The ratio is used as a test statistic to determine whether the statistical information of generated training samples(Sk) is almost equivalent to that of original training samples(S0), denoted by S0 ≃ Sk. If the test statistic obtained rejects the hypothesis S0 ≃ Sk, then these samples are abandoned. Otherwise, the generated samples are accepted and rule induction methods or statistical methods are applied. This method was evaluated to three medical domains. The results show that the proposed method selects training samples which reflect the statistical characteristics of the original training samples although the performance with small samples is not so good.