CLIP4: hybrid inductive machine learning algorithm that generates inequality rules

  • Authors:
  • Krzysztof J. Cios;Lukasz A. Kurgan

  • Affiliations:
  • Dept. of Comp. Sci. and Eng., Univ. of Colorado at Denver, Campus Box 109, P.O. Box 173364, Denver and Dept. of Comp. Sci., Univ. of Colorado at Boulder and Univ. of Colorado Health Sci. Ctr., Den ...;Department of Electrical and Computer Engineering, Universily of Alberta, Edmonton, AB T6G 2V4, Canada

  • Venue:
  • Information Sciences: an International Journal - Special issue: Soft computing data mining
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper describes a hybrid inductive machine learning algorithm called CLIP4. The algorithm first partitions data into subsets using a tree structure and then generates production rules only from subsets stored at the leaf nodes. The unique feature of the algorithm is generation of rules that involve inequalities. The algorithm works with the data that have large number of examples and attributes, can cope with noisy data, and can use numerical, nominal continuous, and missing-value attributes. The algorithm's flexibility and efficiency are shown on several well-known benchmarking data sets, and the results are compared with other machine learning algorithms. The benchmarking results in each instance show the CLIP4's accuracy, CPU time, and rule complexity, CLIP4 has built-in features like tree pruning, methods for partitioning the data (for data with large number of examples and attributes, and for data containing noise), data-independent mechanism for dealing with missing values, genetic operators to improve accuracy on small data, and the discretization schemes. CLIP4 generates model of data that consists of well-generalized rules, and ranks attributes and selectors that can be used for feature selection.